Burf : The reboot!

Did you notice I didn’t say Burf.co? Well, there is a reason for that!

I hope (really hope) that this is the first article of many on the road to me returning to building stuff! It’s been a long road filled with many personal issues (Divorce for example), redundancy, and COVID however I feel that journey is now at a end. I am rather excited about the future and have already started a few projects


So I have returned to Compsoft Creative, briefly as a Senior Software Engineer (C# and Android) and then promoted to the Head of Software Engineering! Compsoft has always been a fantastic place to work and I hope to try and make it even better.


Yep, I have a new hobby and I am really enjoying it! I have as much fun fixing/restoring them as I do riding them. I have owned about 6 bikes in the last year, plus an awesome Robin Reliant Trike. It’s a great hobby with my father and takes me back to being a kid again.

GWiz of Death

One of my less popular purchases was a tiny electric car called a GWiz. New, they were a terrible car, mine being left to die for many years which made it just that little bit shitter. When it was delivered (as it didn’t work), it was a shed. I have had great fun trying to fix it and now it actually moves. The aim is to get it to do 60mph within a few minutes (or before I run out of road)


So due to a few reasons (electric bills of £200pm) I shut down Burf.co and turned everything off, I killed Vibe Innovation and gave up on anything interesting until home life had settled down a bit. As I said, I think it’s time to start resuming the coding projects and I have turned Burf.co back on :). I hope to actually do some cool stuff with Burf.co in the coming months

Inmoov robot

I did start restoring it in 2019/2020 but had to take a break from it. I did lots of stuff to it but just didn’t make any videos. Like Burf.co, I hope to get this moving on soon 🙂

The answer is 2,828,752,948

Well I never thought I would get there and it took a few attempts but I managed to stick the CommonCrawls text corpses of the Internet into MongoDB. This is around 13TB of text, which is definitely more than I can read in a day!

The next task is to work out the language of each page, I have a few ways to do that, then ignore anything that is not English. I am not really sure how I would analyise text data in a language I do not understand hence why I plan to skip it.

After that I need to try and work out what each page’s context is, then chuck it in ElasticSearch.

File Search

CommonCrawl also contains millions (63,270,007) of links to files like PDFs, Docs, and images. I have started processing this data to see what useful information I can extra.

Burf.co Website

Shocking, I know, I think I need to hire someone to do a good job of it! Watch this space.

Its been a while

So it’s been over 4 months since I last updated my blog, to me, it seems a lot longer, maybe that’s to do with the Lockdown? who knows?

So, what have I been up to? except drinking lots of Beer like most of the people under Lockdown are doing.  Well, I have been semi-productive, I would say pretty productive, however, compared to some of my friends (one who launched a radio station while on lockdown), I will retire to sem-productive.

Inmoov Robot

First 3 months of the year, the progress was fairly non-existent however due to the lockdown, I have come leaps and bounds.  I have rebuilt the head, neck, torso, fixed or improved tons of things, required most of it, and made a stand (which blow a hole in my ceiling and nearly killed me).  I am waiting for some potentiometer (Absolute pain to find the right ones) for the arms but I hope to have the biceps done soon.  I have made a start printing the legs for it via my new CopyMaster 400 3D printer which is pretty cool.

I have turned my conservatory into a robotics area especially for this robot which I have found has helped a lot.


So on the mission to learn C# and Azure, I have completely rewritten (all 20 lines of it) Burf.co Search Engine to be Azure Functions running in Docker Containers written in C#,  I actually really enjoyed doing this and again it’s come further along than the old Java/Kotlin one did in weeks versus months.  I have Azure functions that serve up the results to the website, parse and index websites, crawls sites and even a chatbot for fun 🙂 (SearchAI.uk)

I have even fired up the old HP DL580 server (currently keeping the house warm) to see if I can process data faster.

CommonCrawl WET File Processing

So I decided to try and write a script that would download the WET files from the CommonCrawl (56000 files, 8TB compressed).  These files contain 2.8 billion webpages or so and could be a really fun thing to process using ML etc.

Here is my V1.02 of this script, it’s hacky at best but its a start:


namespace SimonPlayGround
    class Program
        private const string Path = "https://commoncrawl.s3.amazonaws.com/";
        private const string TargetFolder = @"z:\";
        public static int Jump = 2320;
        private const int Threads = 20;
        private const int Block = 10000;
        private const string UserId = "simon";

        public static async Task Main(string[] args)
            var client = new WebClient();
            var paths = new List<string>();

            var mongo = new MongoClient(new MongoClientSettings()
                Server = new MongoServerAddress(""),
                MaxConnectionPoolSize = 500

            var db = mongo.GetDatabase("WEB");
            var collection = db.GetCollection<Page>("wet");


            foreach (var line in File.ReadLines(TargetFolder + "wet"))

            // hack to remove done ones
            for (int i = 0; i < Jump; i++)
                paths.RemoveAt(paths.Count - 1);

            var tasks = new List<Task>();

            for (var i = 0; i < Threads; i++)
                var filename = paths.Last();
                tasks.Add(Task.Run(() => Process(filename, collection)));
                paths.RemoveAt(paths.Count - 1);

            while (tasks.Any())
                await Task.WhenAny(tasks);
                var finishedTasks = tasks.Where(t => t.IsCompleted).ToList();
                foreach (var finishedTask in finishedTasks)
                    if (paths.Count > 0)
                        var filename = paths.Last();
                        tasks.Add(Task.Run(() => Process(filename, collection)));
                        paths.RemoveAt(paths.Count - 1);
                        Console.WriteLine($"Left {paths.Count} {tasks.Count} {Jump}");
                        // todo write here the number of files done

        public static async Task Process(string filename, IMongoCollection<Page> collection)
            var file = await DownloadWetAsync(filename);

            await ParseWet(file, collection);
            Console.WriteLine($"FILE PROCESSED");
            Jump += 1;
            // todo write here that file was completed

        public static async Task ParseWet(string filename, IMongoCollection<Page> collection)
            using StreamReader sr = File.OpenText(filename);
            string s;
            StringBuilder sb = new StringBuilder();
            var foundDoc = false;
            var foundURL = false;
            var url = string.Empty;
            var count = 0;
            var pages = new List<Page>();

            Console.WriteLine($"Processing {filename}");

            while ((s = sr.ReadLine()) != null)
                if (foundDoc == false && s.Equals("WARC-Type: conversion"))
                    sb.Append(s + Environment.NewLine);
                    foundDoc = true;
                else if (foundDoc == true && s.Equals("WARC/1.0"))
                    var from = sb.ToString().IndexOf("Content-Length: ", StringComparison.Ordinal) + "Content-Length: ".Length;
                    var text = sb.ToString()[@from..sb.Length];
                    var body = text.Substring(text.IndexOf(Environment.NewLine, StringComparison.Ordinal) + 1);
                    foundDoc = false;
                    foundURL = false;

                        count += 1;
                        pages.Add(new Page()
                            Url = url,
                            Body = body

                        if (count % 1000 == 0)
                            Console.WriteLine($"Procsessed {count} {DateTime.Now}");

                        if (count == Block)
                            count = 0;
                            await BulkSave(pages, collection);
                            Console.WriteLine($"{Block} done {DateTime.Now}");

                else if (foundDoc == true)
                    sb.Append(s + Environment.NewLine);

                    if (foundURL == false && s.StartsWith("WARC-Target-URI: "))
                        var from = s.IndexOf("WARC-Target-URI: ", StringComparison.Ordinal) + "WARC-Target-URI: ".Length;
                        url = s[@from..s.Length];
                        foundURL = true;

            // save any left over
            if (pages.Count > 0)
                await BulkSave(pages, collection);

        public static async Task BulkSave(List<Page> pages, IMongoCollection<Page> collection)
                var updateOneModels = pages.Select(x =>
                    var filterDefinition = Builders<Page>.Filter.Eq(p => p.Url, x.Url);
                    var updateDefinition = Builders<Page>.Update.SetOnInsert(p => p.Body, x.Body);

                    return new UpdateOneModel<Page>(filterDefinition, updateDefinition) { IsUpsert = true };

                var resultWrites = await collection.BulkWriteAsync(updateOneModels);
                Console.WriteLine($"OK?: {resultWrites.IsAcknowledged} - Inserted Count: {resultWrites.InsertedCount} {resultWrites.ModifiedCount}");



        public class HttpRetryMessageHandler : DelegatingHandler
            public HttpRetryMessageHandler(HttpClientHandler handler) : base(handler) { }

            protected override Task<HttpResponseMessage> SendAsync(
                HttpRequestMessage request,
                CancellationToken cancellationToken) =>
                    .OrResult<HttpResponseMessage>(x => !x.IsSuccessStatusCode)
                    .WaitAndRetryAsync(10, retryAttempt => TimeSpan.FromSeconds(Math.Pow(3, retryAttempt)))
                    .ExecuteAsync(() => base.SendAsync(request, cancellationToken));

        public static async Task<string> DownloadWetAsync(string line)
            var filename = line.Split('/').Last();

            if (!File.Exists(TargetFolder + filename))
                Console.WriteLine($"downloading {filename}");

                using (HttpClient client = new HttpClient(new HttpRetryMessageHandler(new HttpClientHandler())))
                    using (HttpResponseMessage response = await client.GetAsync(Path + line, HttpCompletionOption.ResponseHeadersRead))

                    using (Stream streamToReadFrom = await response.Content.ReadAsStreamAsync())
                        using (Stream streamToWriteTo = File.Open(TargetFolder + filename, FileMode.Create))
                            await streamToReadFrom.CopyToAsync(streamToWriteTo);
                Console.WriteLine($"GZ exist {filename}");

            var wetFile = TargetFolder + filename.Substring(0, filename.Length - 3);

            if (!File.Exists(wetFile))
                Console.WriteLine($"Decompressing {filename}");
                DecompressGZip(TargetFolder + filename, wetFile);
                Console.WriteLine($"WET exist {wetFile}");

            return wetFile;

        public static void DownloadPaths(WebClient client)
            client.DownloadFile("https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2020-16/wet.paths.gz", TargetFolder + "wet.gz");
            DecompressGZip(TargetFolder + "wet.gz", TargetFolder + "wet");

        public static void DecompressGZip(String fileRoot, String destRoot)
            using FileStream fileStram = new FileStream(fileRoot, FileMode.Open, FileAccess.Read);
            using GZipInputStream zipStream = new GZipInputStream(fileStram);
            using StreamReader sr = new StreamReader(zipStream);
            var data = sr.ReadToEnd();
            File.WriteAllText(destRoot, data);

    public class Page
        [BsonId] public ObjectId Id { get; set; }
        [BsonElement("url")] public string Url { get; set; }
        [BsonElement("body")] public string Body { get; set; }


2019 can F*ck off! 2020, yeah baby!

Wow, it appears that I have not updated my blog since Feb 2019. At that time, life was great, brainpower was at a high, ideas were flowing and world domination (Burf Style) was near (Nope this isn’t a Brexit rant). Then life got turned upside down, and I had to deal (well try and deal) with a great crisis in my life, my wife sadly no longer wanted to be with me. 2019 was a horrible horrible year and so I didn’t really build anything, I shut down my sites, projects, and hobbies and just tried to deal with what was going on at home.

Long road ahead
So, I would love to write there was a happy ending to 2019 but these things take years especially when kids are involved and the process has just started however you just can’t keep my crazy brain down for too long! So while I was aimlessly surfing eBay for random things, I came across this for sale!

Inmoov Robot 3D Printed

Seeing an Inmoov robot for sell on eBay is rare as rocking horse poo, it takes people years to print these (4 years for this one) and are usually attached to a college, university or something. This one needed some work but I thought what a perfect project for me to sink my efforts in to! My dad agreed and purchased it as my Christmas present (Thanks dad).

The Inmoov robot project is one of the most amazing 3D projects out there, it allows someone with a lot of time (and some money to buy servos), to print a fully functional top half of a robot, to buy the same sort of thing would cost you hundreds of thousands of pounds. I have printed a few parts of it before to merge with educational products like LEGO and VEX however, I always run out of steam (Time) while actually attempting to print the whole thing. Buying a mostly built one means I can fix a bit at a time.

So this year’s focus is to complete this while learning more about robotics, engineering, and electronics:) I will try and do a YouTube series on my progress and I will still try to think of crazy ways to try and make Burf.co a better search engine.

Watch this space 🙂

Inmoov Robot 3D Printed

Well that plan went Pete Tong!

So, in my last blog post, my servers had given me the middle finger and in the post before that, I had said that I was determined to do Robotics this year, stay focused and not take anything else on. Well, lets just put them crazy thoughts into the bin, we all knew I was going to epically fail them!

So in no particular order, here’s what’s been going on:

The Server
So, the new DL580 server is now working, I won’t mention I took a power drill to it, but let’s just say its playing ball. It’s now got a new raid card, HP Storage Works Array and 22TB of lovely storage ready for anything I may want to chuck at it. Its currently turned off at the moment but the reasons for that come up soon!

Short version, its the t*ts, is epically cool! Go install it now!
So OpenFaas is an Open Source Function as a service framework (Think AWS Lambda, Azure Functions etc) which runs on top of Docker/Kubernetes and allows a developer to focus on creating call functions instead of infrastructure so much. It auto scales, supports tons of programming languages and has great analytics. It’s free and runs on practically any hardware including the Raspberry Pi. The Cloud addon to it (addon may not be the right word, maybe V2) makes things even easier by hooking into Github etc and then can automatically deploy after a commit. What really got me excited about this was that I could have a single package containing Python, Java and c# (for example) functions and not need to worry about any of the infrastructure needed to make them an API. I then could use a single command to deploy it all to my server!

There are other free FAAS solutions out there (FN, Azure Functions Runtime, Openwhisk, Serverless etc) but OpenFaas is extremely popular and only is getting stronger (13K stars on GitHub), has a great set of examples including a functions store and has a very helpful active Slack channel for support etc.

So, the reason the server has been off is partly that I have been waiting for some hard drives to turn up, and partly because of OpenFaas. Before I had discussed Faas, I was going to just build VM’s (Hyper-v) for all the servers I want. However, all that has changed now due to OpenFaas and I would really like all future stuff to work serverless. Because most Serverless frameworks seem to sit on top of Docker, there is some further research around Windows Server 2016 and containers needed to be done. The end result may be that I have to format my new server with Ubuntu, or create a Linux VM to host Docker on top of Windows. I still need to do some research as I believe WS2016 introduced mixed (Windows/Linux) containers for Docker.

Another Startup wants some Burf
Yup, I seem to be in demand, or it’s a way to stop me building Skynet (or some sort of killer robot). Another startup company, who is nearly ready to launch has asked me to help them out and get them across the line. It’s a fantastic opportunity which I am pretty excited about as I get to take ownership of the entire technology stack (which for once I don’t need to develop). My first challenge is around AWS and making sure their solution scale which should be fun! I will disclose more soon!

My servers shit the bed!

So, this year I plan to focus on my robotics and try and get back into machine learning. With my usual way of working, I thought I would fire up the Blade Server (HP C7000) and start getting some environments going etc. Server says no! Well its admin console does anyway and this is how you control all the blades. So after a week of kicking it, I decided to get something more practical (and something I could actually lift). I got a HP DL580 Gen 7 + HP StorageWorks running 16TB of space! PERFECT

Round 2: More shitting of the bed!
So everything was going great except the StorageWorks Disk Array seemed a little slow, sometimes it would do 500mb a sec, then 700k??? Then it shut down a few times and now it doesn’t even switch on 🙁 Hopefully it will be replaced this week however I had just got used to HyperV and made a few Vm’s to replace my old servers.

I think once I have a server infrastructure that works even when it’s not a full moon, I am going to start rewriting the Burf.co search engine so that NLP (Natural Language Processing) is built in at the start. I am thinking of building a search engine just for science and technology articles.

On other news, I have been checking out YOLO (You only look once) for image recognition using Tensorflow. It is super cool and I hope to use it soon with real-time video for my robots.

Robots, ROBOTS and more ROBOTS

So, happy New Years everyone, I hope you all had a great Christmas! With the start of a new year, begins new challenges!

Last Year
So last year was a good year, new job, completed a lifelong goal to do a bodybuilding show, got a diploma and started Vibe Innovation/Burf.co Search Engine and some other things.

I didn’t however, do any robotics projects, I didn’t really build anything crazy and I didn’t release Hack24 etc.

The plans for this year
So I started writing down the usual New Years resolution (Family, Grym Training, Money saving, 10 different projects, take over the world etc) and thought, hey maybe I should just try and do less but better as that worked very well for my bodybuilding show. I focused singlely on that and succeeded. So I started thinking about all the stuff I enjoy that does not make me feel like I am doing work etc. I have a great job, so I need to also have great hobbies that are enjoyable and don’t feel like work 🙂

So I am turning my spare time back to building robots and whacky stuff but to a whole new level. Finally, I am actually getting round to designing them (as opposed to building them out of LEGO etc). I have already started using design tools like Fusion 360, I have printed out one of my own designs on my 3D printer, and tonight I did my first CNC design and fired it up on my CNC machine. I have made more progress this week than I did last year!

From the software point of view, I have already (last week) written a little Android app that is like Alexa (hot-word controlled to then stream audio to a server, then process it). I also want to get back into ROS (Robotic Operating System) so that my robot runs the industry standard. I do want to find a use for Burf but it will probably be linked to the robot somehow as a knowledge base.

I think the mindset change is that instead of thinking, will this project/thing improve me, it’s now, will this project/thing be fun.

So here is the current list of old and new projects (subject to change 🙂 )

Robots (FUN)
– Build my own custom robot (this is the big one)
– Fire up my old Lidar VEX one (for learning ROS, a small amount of work, helps above)
– Finish the VEX humanoid (I promised to do it and helps the other robotic projects)

– Burf.co : Development paused until I think of use (probably for robots) Likely to be re-written in .net core to learn for work
– Vibe Innovation: Needs a redesign and to really home in on what I want to offer (proof of concept work, AKA fun).
– Burf Development: This really should become Vibe I think, but for the moment, it’s not going anyway

– Echo : I resigned from this before Christmas because I felt it needed a whole team of devs!
– Hack24 : paused, likely dead… LibGDX just seems to cause me issues!
– Build either a Car or Traction Engine out of Meccano, why, because it would be fun!

It may still seem to some a lot (6 projects) but a few of them wouldn’t take very long to do (e.g website update, Meccano vehicle), and then clear the way for everything else which all kinda link with each other.

Let’s see if I actually stick to it 🙂

The spinning of many plates

Oh boy, these updates are going to get crazier!

The sale(s)

So first up I sold 2 copies of the Keyword Research Software which is on offer at KeywordResearchNinja.co.uk  The odd thing about this is that I hadn’t even finished the site for people to even access the software because I assumed no one would ever find it until I wanted them to!  I hadn’t even finished setting up my business PayPal account which is needed to make a payment link 🙁  This leads me on to…

Paperwork Time!

So, my plan is to learn how to run a business through Vibe Innovation.  For me to do this, I need a business bank account, which I also need for my PayPal Business Account.  I looked online for good deals on business bank accounts (e.g ones with little or no monthly cost) and went with Tide.  Now for them to accept me, I need to have an updated Confirmation Statement logged with Companies House as the current one reflects the previous owners/shareholders.  It was fairly easy to do but (I say that, it’s not gone through yet) this whole process is to set up an account, to finish setting up my PayPal account.

My Garage on Google Maps

While trying to achieve something, (which I have completely forgotten what it was) I got sidetracked on to listing Vibe Innovation on Google Maps.  Google makes it super easy to list your business however, it picks a picture of where it thinks that address is.  The current picture is of some white garages which is a great introduction to new customers 🙂


I am really thinking about the concept of focusing!  I really want to focus on the concept of focusing on something.  At the moment I seem to be doing far too many small things.  I do think I am making some sort of progress but I think my New Years Resolutions will be around doing less but doing the things I do better.

Echo Music 

Early morning coding (5 am) is going rather well.  Still waiting for design assets from the team but getting there.


Making a little bit of progress, however I have a plan and it’s in this video:


Busy week, nothing achieved!

The title of this weeks post is probably a little unfair on myself.  Lots of stuff got worked on last week but no milestones to talk about.  After reading a blog post on productivity, I think I need to start focusing on fewer things to achieve more.  My constant context switching is fun but less productive.

Vibe Innovation

So this is my idea of starting a proper business and learning the ropes of a limited company.  The idea is to provide a technology prototyping service, basically making proof of concepts for people.  Progress this week has been around getting a site up and running (VibeInnovation.com), talking to accountants and stuff and preparing my business PayPal account.


I am trying to automate as much of this as possible and just make it run itself.  After that, I need to design a service or product that it offers that people will want and pay for.


Progress on this has been pretty good, for the MVP I have decided to use Google’s Firebase to handle the backend communication and data.  Once the paperwork is done for Vibe, my primary focus will be on Echo.


Paused 🙁

Keyword Research Ninja

So I have updated the costs of the product, linked it to my business PayPal account and created a product page.  All I need to do next is link it to the product and the site is done.


Finally started playing around with the VEX EDR v5 hardware, ready to make a large epic robot 🙂

Crazy times with Innovation and Ninja’s

I have a plan, it involves learning, innovation and contains the word Ninja? Confused, read on!

So, this weeks blog post will probably be more confusing than most.  I had a few ideas floating around in my head that have formed into a kind of business idea (maybe even a plan).


First off, I was looking to buy or start a proper business, what I mean by proper is that it is a limited company registered with Companies House. Why you ask?  Well, I thought it would be a good thing to learn about.   I Thought a good place to start was buying something random off eBay that was already set up,  I hadn’t really worked out what to buy but something online, maybe marketing, SEO or something around that area.  I randomly brought the site Keyword Research Ninja for less than a Dominos pizza.  It sells keyword research software written in C#.  For me, the worst case scenario of buying this site is that I have some fun with the software.  However, this does not fulfill my idea of getting a limited company and the site does need some work.

Vibe Marketing Limited

So, also on eBay, they sell dormant companies (again not sure why I just didn’t set up a new one), and I came across someone selling Vibe Marketing Limited, they wanted quite a bit of money for a name,  after some strong negotiation skills, I brought it for less than a Dominos pizza!  I thought, hey I can get back into SEO (I used to love doing that in my spare time) and do it part-time for fun, just to learn the ropes of running a company.

Coffee with an old friend

I had been in contact with an old friend who used to work at Compsoft when I first joined.  He was a member of R&D and went off to finish his degree, start a company and do pretty well in life.  Anyway, we met up for Costa (Starbucks was too far away) and had a chat about life, work, and code.  He had set up a successful HR company that his wife runs, and he now does PoC work for fun to keep him actively coding.  I mentioned to him to him that I really enjoyed my time at O2’s Lab doing innovation and PoC work and that I missed the crazy times of making stuff work in new tech and then chucking it over the wall for a dev team to properly implement it if the business thought it was viable.

The penny drops.

So, I decided that my next plan of action was to take this limited company (in the process of being renamed to Vibe Innovation) and set it up as a consultancy company that creates proof of concepts for people.   I want to start really small with really small goals, this is still a learning process.  This year’s aim (November / December) is to create the website, set up email and show some of the projects I have done in the past.  Next years goal is simple, do 1 paid bit of work and work out how to process it through the books.

Other news

So work on Echo Music Group is actually progressing nicely.  It’s great fun getting back into doing iOS work.  I am currently working on the signup process.

Elasticsearch definitely seems to be the way to go with Burf.co which has been working well as a search engine.  I need to improve the quality of the dataset (which is in progress) and fix the site to work on mobile.  It has had over 16,000 sites manually submitted to it in the last week which is pretty epic.

I have also upped my reading for a second week in a row, I am now currently reading the $100 startup…. can’t imagine why 🙂