Its been a while

So it’s been over 4 months since I last updated my blog, to me, it seems a lot longer, maybe that’s to do with the Lockdown? who knows?

So, what have I been up to? except drinking lots of Beer like most of the people under Lockdown are doing.  Well, I have been semi-productive, I would say pretty productive, however, compared to some of my friends (one who launched a radio station while on lockdown), I will retire to sem-productive.

Inmoov Robot

First 3 months of the year, the progress was fairly non-existent however due to the lockdown, I have come leaps and bounds.  I have rebuilt the head, neck, torso, fixed or improved tons of things, required most of it, and made a stand (which blow a hole in my ceiling and nearly killed me).  I am waiting for some potentiometer (Absolute pain to find the right ones) for the arms but I hope to have the biceps done soon.  I have made a start printing the legs for it via my new CopyMaster 400 3D printer which is pretty cool.

I have turned my conservatory into a robotics area especially for this robot which I have found has helped a lot.

So on the mission to learn C# and Azure, I have completely rewritten (all 20 lines of it) Search Engine to be Azure Functions running in Docker Containers written in C#,  I actually really enjoyed doing this and again it’s come further along than the old Java/Kotlin one did in weeks versus months.  I have Azure functions that serve up the results to the website, parse and index websites, crawls sites and even a chatbot for fun 🙂 (

I have even fired up the old HP DL580 server (currently keeping the house warm) to see if I can process data faster.

CommonCrawl WET File Processing

So I decided to try and write a script that would download the WET files from the CommonCrawl (56000 files, 8TB compressed).  These files contain 2.8 billion webpages or so and could be a really fun thing to process using ML etc.

Here is my V1.02 of this script, it’s hacky at best but its a start:


namespace SimonPlayGround
    class Program
        private const string Path = "";
        private const string TargetFolder = @"z:\";
        public static int Jump = 2320;
        private const int Threads = 20;
        private const int Block = 10000;
        private const string UserId = "simon";

        public static async Task Main(string[] args)
            var client = new WebClient();
            var paths = new List<string>();

            var mongo = new MongoClient(new MongoClientSettings()
                Server = new MongoServerAddress(""),
                MaxConnectionPoolSize = 500

            var db = mongo.GetDatabase("WEB");
            var collection = db.GetCollection<Page>("wet");


            foreach (var line in File.ReadLines(TargetFolder + "wet"))

            // hack to remove done ones
            for (int i = 0; i < Jump; i++)
                paths.RemoveAt(paths.Count - 1);

            var tasks = new List<Task>();

            for (var i = 0; i < Threads; i++)
                var filename = paths.Last();
                tasks.Add(Task.Run(() => Process(filename, collection)));
                paths.RemoveAt(paths.Count - 1);

            while (tasks.Any())
                await Task.WhenAny(tasks);
                var finishedTasks = tasks.Where(t => t.IsCompleted).ToList();
                foreach (var finishedTask in finishedTasks)
                    if (paths.Count > 0)
                        var filename = paths.Last();
                        tasks.Add(Task.Run(() => Process(filename, collection)));
                        paths.RemoveAt(paths.Count - 1);
                        Console.WriteLine($"Left {paths.Count} {tasks.Count} {Jump}");
                        // todo write here the number of files done

        public static async Task Process(string filename, IMongoCollection<Page> collection)
            var file = await DownloadWetAsync(filename);

            await ParseWet(file, collection);
            Console.WriteLine($"FILE PROCESSED");
            Jump += 1;
            // todo write here that file was completed

        public static async Task ParseWet(string filename, IMongoCollection<Page> collection)
            using StreamReader sr = File.OpenText(filename);
            string s;
            StringBuilder sb = new StringBuilder();
            var foundDoc = false;
            var foundURL = false;
            var url = string.Empty;
            var count = 0;
            var pages = new List<Page>();

            Console.WriteLine($"Processing {filename}");

            while ((s = sr.ReadLine()) != null)
                if (foundDoc == false && s.Equals("WARC-Type: conversion"))
                    sb.Append(s + Environment.NewLine);
                    foundDoc = true;
                else if (foundDoc == true && s.Equals("WARC/1.0"))
                    var from = sb.ToString().IndexOf("Content-Length: ", StringComparison.Ordinal) + "Content-Length: ".Length;
                    var text = sb.ToString()[];
                    var body = text.Substring(text.IndexOf(Environment.NewLine, StringComparison.Ordinal) + 1);
                    foundDoc = false;
                    foundURL = false;

                        count += 1;
                        pages.Add(new Page()
                            Url = url,
                            Body = body

                        if (count % 1000 == 0)
                            Console.WriteLine($"Procsessed {count} {DateTime.Now}");

                        if (count == Block)
                            count = 0;
                            await BulkSave(pages, collection);
                            Console.WriteLine($"{Block} done {DateTime.Now}");

                else if (foundDoc == true)
                    sb.Append(s + Environment.NewLine);

                    if (foundURL == false && s.StartsWith("WARC-Target-URI: "))
                        var from = s.IndexOf("WARC-Target-URI: ", StringComparison.Ordinal) + "WARC-Target-URI: ".Length;
                        url = s[@from..s.Length];
                        foundURL = true;

            // save any left over
            if (pages.Count > 0)
                await BulkSave(pages, collection);

        public static async Task BulkSave(List<Page> pages, IMongoCollection<Page> collection)
                var updateOneModels = pages.Select(x =>
                    var filterDefinition = Builders<Page>.Filter.Eq(p => p.Url, x.Url);
                    var updateDefinition = Builders<Page>.Update.SetOnInsert(p => p.Body, x.Body);

                    return new UpdateOneModel<Page>(filterDefinition, updateDefinition) { IsUpsert = true };

                var resultWrites = await collection.BulkWriteAsync(updateOneModels);
                Console.WriteLine($"OK?: {resultWrites.IsAcknowledged} - Inserted Count: {resultWrites.InsertedCount} {resultWrites.ModifiedCount}");



        public class HttpRetryMessageHandler : DelegatingHandler
            public HttpRetryMessageHandler(HttpClientHandler handler) : base(handler) { }

            protected override Task<HttpResponseMessage> SendAsync(
                HttpRequestMessage request,
                CancellationToken cancellationToken) =>
                    .OrResult<HttpResponseMessage>(x => !x.IsSuccessStatusCode)
                    .WaitAndRetryAsync(10, retryAttempt => TimeSpan.FromSeconds(Math.Pow(3, retryAttempt)))
                    .ExecuteAsync(() => base.SendAsync(request, cancellationToken));

        public static async Task<string> DownloadWetAsync(string line)
            var filename = line.Split('/').Last();

            if (!File.Exists(TargetFolder + filename))
                Console.WriteLine($"downloading {filename}");

                using (HttpClient client = new HttpClient(new HttpRetryMessageHandler(new HttpClientHandler())))
                    using (HttpResponseMessage response = await client.GetAsync(Path + line, HttpCompletionOption.ResponseHeadersRead))

                    using (Stream streamToReadFrom = await response.Content.ReadAsStreamAsync())
                        using (Stream streamToWriteTo = File.Open(TargetFolder + filename, FileMode.Create))
                            await streamToReadFrom.CopyToAsync(streamToWriteTo);
                Console.WriteLine($"GZ exist {filename}");

            var wetFile = TargetFolder + filename.Substring(0, filename.Length - 3);

            if (!File.Exists(wetFile))
                Console.WriteLine($"Decompressing {filename}");
                DecompressGZip(TargetFolder + filename, wetFile);
                Console.WriteLine($"WET exist {wetFile}");

            return wetFile;

        public static void DownloadPaths(WebClient client)
            client.DownloadFile("", TargetFolder + "wet.gz");
            DecompressGZip(TargetFolder + "wet.gz", TargetFolder + "wet");

        public static void DecompressGZip(String fileRoot, String destRoot)
            using FileStream fileStram = new FileStream(fileRoot, FileMode.Open, FileAccess.Read);
            using GZipInputStream zipStream = new GZipInputStream(fileStram);
            using StreamReader sr = new StreamReader(zipStream);
            var data = sr.ReadToEnd();
            File.WriteAllText(destRoot, data);

    public class Page
        [BsonId] public ObjectId Id { get; set; }
        [BsonElement("url")] public string Url { get; set; }
        [BsonElement("body")] public string Body { get; set; }


2019 can F*ck off! 2020, yeah baby!

Wow, it appears that I have not updated my blog since Feb 2019. At that time, life was great, brainpower was at a high, ideas were flowing and world domination (Burf Style) was near (Nope this isn’t a Brexit rant). Then life got turned upside down, and I had to deal (well try and deal) with a great crisis in my life, my wife sadly no longer wanted to be with me. 2019 was a horrible horrible year and so I didn’t really build anything, I shut down my sites, projects, and hobbies and just tried to deal with what was going on at home.

Long road ahead
So, I would love to write there was a happy ending to 2019 but these things take years especially when kids are involved and the process has just started however you just can’t keep my crazy brain down for too long! So while I was aimlessly surfing eBay for random things, I came across this for sale!

Inmoov Robot 3D Printed

Seeing an Inmoov robot for sell on eBay is rare as rocking horse poo, it takes people years to print these (4 years for this one) and are usually attached to a college, university or something. This one needed some work but I thought what a perfect project for me to sink my efforts in to! My dad agreed and purchased it as my Christmas present (Thanks dad).

The Inmoov robot project is one of the most amazing 3D projects out there, it allows someone with a lot of time (and some money to buy servos), to print a fully functional top half of a robot, to buy the same sort of thing would cost you hundreds of thousands of pounds. I have printed a few parts of it before to merge with educational products like LEGO and VEX however, I always run out of steam (Time) while actually attempting to print the whole thing. Buying a mostly built one means I can fix a bit at a time.

So this year’s focus is to complete this while learning more about robotics, engineering, and electronics:) I will try and do a YouTube series on my progress and I will still try to think of crazy ways to try and make a better search engine.

Watch this space 🙂

Inmoov Robot 3D Printed

Well that plan went Pete Tong!

So, in my last blog post, my servers had given me the middle finger and in the post before that, I had said that I was determined to do Robotics this year, stay focused and not take anything else on. Well, lets just put them crazy thoughts into the bin, we all knew I was going to epically fail them!

So in no particular order, here’s what’s been going on:

The Server
So, the new DL580 server is now working, I won’t mention I took a power drill to it, but let’s just say its playing ball. It’s now got a new raid card, HP Storage Works Array and 22TB of lovely storage ready for anything I may want to chuck at it. Its currently turned off at the moment but the reasons for that come up soon!

Short version, its the t*ts, is epically cool! Go install it now!
So OpenFaas is an Open Source Function as a service framework (Think AWS Lambda, Azure Functions etc) which runs on top of Docker/Kubernetes and allows a developer to focus on creating call functions instead of infrastructure so much. It auto scales, supports tons of programming languages and has great analytics. It’s free and runs on practically any hardware including the Raspberry Pi. The Cloud addon to it (addon may not be the right word, maybe V2) makes things even easier by hooking into Github etc and then can automatically deploy after a commit. What really got me excited about this was that I could have a single package containing Python, Java and c# (for example) functions and not need to worry about any of the infrastructure needed to make them an API. I then could use a single command to deploy it all to my server!

There are other free FAAS solutions out there (FN, Azure Functions Runtime, Openwhisk, Serverless etc) but OpenFaas is extremely popular and only is getting stronger (13K stars on GitHub), has a great set of examples including a functions store and has a very helpful active Slack channel for support etc.

So, the reason the server has been off is partly that I have been waiting for some hard drives to turn up, and partly because of OpenFaas. Before I had discussed Faas, I was going to just build VM’s (Hyper-v) for all the servers I want. However, all that has changed now due to OpenFaas and I would really like all future stuff to work serverless. Because most Serverless frameworks seem to sit on top of Docker, there is some further research around Windows Server 2016 and containers needed to be done. The end result may be that I have to format my new server with Ubuntu, or create a Linux VM to host Docker on top of Windows. I still need to do some research as I believe WS2016 introduced mixed (Windows/Linux) containers for Docker.

Another Startup wants some Burf
Yup, I seem to be in demand, or it’s a way to stop me building Skynet (or some sort of killer robot). Another startup company, who is nearly ready to launch has asked me to help them out and get them across the line. It’s a fantastic opportunity which I am pretty excited about as I get to take ownership of the entire technology stack (which for once I don’t need to develop). My first challenge is around AWS and making sure their solution scale which should be fun! I will disclose more soon!

The spinning of many plates

Oh boy, these updates are going to get crazier!

The sale(s)

So first up I sold 2 copies of the Keyword Research Software which is on offer at  The odd thing about this is that I hadn’t even finished the site for people to even access the software because I assumed no one would ever find it until I wanted them to!  I hadn’t even finished setting up my business PayPal account which is needed to make a payment link 🙁  This leads me on to…

Paperwork Time!

So, my plan is to learn how to run a business through Vibe Innovation.  For me to do this, I need a business bank account, which I also need for my PayPal Business Account.  I looked online for good deals on business bank accounts (e.g ones with little or no monthly cost) and went with Tide.  Now for them to accept me, I need to have an updated Confirmation Statement logged with Companies House as the current one reflects the previous owners/shareholders.  It was fairly easy to do but (I say that, it’s not gone through yet) this whole process is to set up an account, to finish setting up my PayPal account.

My Garage on Google Maps

While trying to achieve something, (which I have completely forgotten what it was) I got sidetracked on to listing Vibe Innovation on Google Maps.  Google makes it super easy to list your business however, it picks a picture of where it thinks that address is.  The current picture is of some white garages which is a great introduction to new customers 🙂


I am really thinking about the concept of focusing!  I really want to focus on the concept of focusing on something.  At the moment I seem to be doing far too many small things.  I do think I am making some sort of progress but I think my New Years Resolutions will be around doing less but doing the things I do better.

Echo Music 

Early morning coding (5 am) is going rather well.  Still waiting for design assets from the team but getting there.


Making a little bit of progress, however I have a plan and it’s in this video:


Busy week, nothing achieved!

The title of this weeks post is probably a little unfair on myself.  Lots of stuff got worked on last week but no milestones to talk about.  After reading a blog post on productivity, I think I need to start focusing on fewer things to achieve more.  My constant context switching is fun but less productive.

Vibe Innovation

So this is my idea of starting a proper business and learning the ropes of a limited company.  The idea is to provide a technology prototyping service, basically making proof of concepts for people.  Progress this week has been around getting a site up and running (, talking to accountants and stuff and preparing my business PayPal account.

I am trying to automate as much of this as possible and just make it run itself.  After that, I need to design a service or product that it offers that people will want and pay for.


Progress on this has been pretty good, for the MVP I have decided to use Google’s Firebase to handle the backend communication and data.  Once the paperwork is done for Vibe, my primary focus will be on Echo.


Paused 🙁

Keyword Research Ninja

So I have updated the costs of the product, linked it to my business PayPal account and created a product page.  All I need to do next is link it to the product and the site is done.


Finally started playing around with the VEX EDR v5 hardware, ready to make a large epic robot 🙂

Crazy times with Innovation and Ninja’s

I have a plan, it involves learning, innovation and contains the word Ninja? Confused, read on!

So, this weeks blog post will probably be more confusing than most.  I had a few ideas floating around in my head that have formed into a kind of business idea (maybe even a plan).


First off, I was looking to buy or start a proper business, what I mean by proper is that it is a limited company registered with Companies House. Why you ask?  Well, I thought it would be a good thing to learn about.   I Thought a good place to start was buying something random off eBay that was already set up,  I hadn’t really worked out what to buy but something online, maybe marketing, SEO or something around that area.  I randomly brought the site Keyword Research Ninja for less than a Dominos pizza.  It sells keyword research software written in C#.  For me, the worst case scenario of buying this site is that I have some fun with the software.  However, this does not fulfill my idea of getting a limited company and the site does need some work.

Vibe Marketing Limited

So, also on eBay, they sell dormant companies (again not sure why I just didn’t set up a new one), and I came across someone selling Vibe Marketing Limited, they wanted quite a bit of money for a name,  after some strong negotiation skills, I brought it for less than a Dominos pizza!  I thought, hey I can get back into SEO (I used to love doing that in my spare time) and do it part-time for fun, just to learn the ropes of running a company.

Coffee with an old friend

I had been in contact with an old friend who used to work at Compsoft when I first joined.  He was a member of R&D and went off to finish his degree, start a company and do pretty well in life.  Anyway, we met up for Costa (Starbucks was too far away) and had a chat about life, work, and code.  He had set up a successful HR company that his wife runs, and he now does PoC work for fun to keep him actively coding.  I mentioned to him to him that I really enjoyed my time at O2’s Lab doing innovation and PoC work and that I missed the crazy times of making stuff work in new tech and then chucking it over the wall for a dev team to properly implement it if the business thought it was viable.

The penny drops.

So, I decided that my next plan of action was to take this limited company (in the process of being renamed to Vibe Innovation) and set it up as a consultancy company that creates proof of concepts for people.   I want to start really small with really small goals, this is still a learning process.  This year’s aim (November / December) is to create the website, set up email and show some of the projects I have done in the past.  Next years goal is simple, do 1 paid bit of work and work out how to process it through the books.

Other news

So work on Echo Music Group is actually progressing nicely.  It’s great fun getting back into doing iOS work.  I am currently working on the signup process.

Elasticsearch definitely seems to be the way to go with which has been working well as a search engine.  I need to improve the quality of the dataset (which is in progress) and fix the site to work on mobile.  It has had over 16,000 sites manually submitted to it in the last week which is pretty epic.

I have also upped my reading for a second week in a row, I am now currently reading the $100 startup…. can’t imagine why 🙂

This weeks update: Just keep spinner!

So as the wife is in Portugal getting some sun and I am home alone with the kids who are now finally asleep.  I thought I would review the week(s)

There is not much to report except a realisation that creating a new Burf Search Engine is gonna be a lot of work!  Even taking a fairly small chunk of it (100 million pages), I just can’t produce results fast enough using MongoDB.  So I am thinking about a few key points I want to focus.

  • Must be fast
  • Must be fairly useful and produce useful results
  • Must update itself
  • Must have a niche

So at the moment, I take the first 100,000,000 URLs from the CommonCrawl that returned HTTP status of 200 and are marked as English.  The CommonCrawl contains I believe around 2.3 billion URLs and so what I may do is filter which ones I want.  I could also build up a simple list of top sites (BBC, Wikipedia, MSN etc) and just index them once a week.

I also need to think about the people who submit their sites,  I need to at least action them.  I am thinking of moving to Elasticsearch just to speed everything up.  Everything seems slow in MongoDB past 1 million records even on an SSD.

I have also found myself regressing a bit to my old ways of buying domains, looking at turnkey websites and SEO tips and tricks.  I used to love this and had over 100 sites at one point.

On a side note, has a new site design and is now hosted remotely 🙂

Echo MG

So iOS development has started on the MVP which is good, more would have happened if had played ball.  This is far more important than, so if it comes to it, I will turn off to focus.  It’s nice to do some iOS development again.


This had been parked however this is a game jam coming up this Friday and I think with a lot of RedBull and sugar, I may be able to actually complete the MVP and get it uploaded which would be great.


So some bits have been printed but I really need to sort the garage out so that I can then sort out all the EDR parts.  Too many jobs, not enough time 🙁

New, Business Success Diploma and building furniture!

So, I am trying my best to make sure I update my blog regularly so that I can look back and see what I have achieved., now with no search button

So between spending over 12 hours building furniture for the wife (2 new bunk beds, 2 desks, 2 chairs and a bookcase) I also managed to build a new MVP of  The previous version was using Swift and was all in one (frontend/ backend).  The new one is properly structured and uses VueJS for the frontend hosted externally, and the backend is in Java Spring Boot.  It is just a prototype at the moment and doesn’t even have a search button, you just type and it starts getting the results (and smashing the db) but it was a fun prototype to build and is definitely the way I want to go.

Next Steps

  • Increase dataset from 10 million to 100 million and make it fast as f*ck
  • Add site pages (contact, about, submit etc)
  • Add tag cloud and routing to make it better for SEO

Business Success Diploma

So ages ago I signed up to Shaw Academy as it has some great courses and reviews.  I got a lifetime membership so I wasn’t in a rush.  Since I finished the bodybuilding show, I have been caining through the course at super speed and hope to take my exam this week:). It was one of my new year’s resolutions to do (so was Hack24 🙁 ).  I am learning a lot and hope to carry on with more advanced training around business.


The above projects are the fluff as I call it to get out of the way so that I can focus on Echo and getting this startup off the ground.  I already started building a basic website for them.  Once that’s complete, its time to focus on the MVP


When it rains, it pours!  So before I had even thought about the above 3 projects, I had decided I was going to build a humanoid out of VEX EDR, which is a brilliant building platform.  They had just brought out their new V5 system which looks epic (more powerful motors, more motors, vision control, touch screen etc).  VEX was very nice in sending me some V5 hardware to start building this 3D Printed / VEX EDR humanoid!  This will be my chilled weekend project 🙂


A busy week : CTO, 70 million and a Java backend!

As I do my final prep for my bodybuilding show tomorrow, I thought I would do a quick post of what’s happened over the week.  I had planned to do very little due to the show however people who know me, know I don’t like to stay still for long!


So, a while back I got invited to become the part-time CTO of a small startup called Echo MG (Music Global) who have big plans to change how entertainment is booked, artists etc.  The role would include everything from designing their infrastructure to help pitch their MVP to investors.  It all sounded very exciting, however, I had to turn it down at the time due to changing my main job (from O2 to Reach).  Now I have settled in and got approval, I have gone for the role.  I hope to post more about this as it develops but it should be a good learning experience.

70 Million

So, I kicked off again on Friday and its been going like the clappers! (and not even using the blade server).  It’s currently sitting at 70 million pages and once it gets to 100, I plan to stop it and start the NLP parsing which should be super interesting.

Java Sprint Boot(kotlin)

So I have had to think long and hard what to write my search engine, AI/ML platform and robotics stuff in.  The obvious choice is Python but I just don’t like the syntax (space, no brackets etc).  So after looking around and seeing what would also help me for work, I decided to learn Spring Boot using Kotlin.  Kotlin is a great language, it’s very like Swift which is brilliant.  I can use it with Android so helps work and there is not much you can’t do with Java.  Spring Boot seems to be like black magic, you go to their site, tell them the frameworks, build tools etc and press a button and it makes a project for you 🙂


So I brought a Chromebook, it’s actually the 2nd one I have owned maybe 3rd but that was before they could run Android apps which is sweet!  Why? you ask!  Well, I wanted a cheap, light laptop with good battery life and no noise (e.g I cant dev on it) to focus me to actually plan and write stuff down.  So by being fairly limiting, and running off the Google ecosystem, it should make me more productive!


Lets kick it off again!

So one of the only plus points of not getting any sleep due to the effects of the extreme diet for this bodybuilding show is insomnia, my mind gets very creative and forces me to start kicking off new ideas, projects, missions et!

So, if you saw my last post, I said I was gonna finish Hack24, fix and sort the garage!   So far, is back up, but about to completely change, the garage is nearly finished being geared up as a robotics lab and Hack24 has not moved.  I do want to finish hack24 but I don’t want to rush it and I want to harness my energy on some crazy robotics ideas while my brain still works 🙂

So the plan v2!  Warning it’s a little bonkers, even for me!

Build a backend set of machine learning API’s that, mobile devices, and my robots use to send and retrieve data.  The idea is I could send it a question, a command or an image and it does some magic and responds.

  • So for mobile devices, they would send images and text to speech, it would return ImageNet classification or answers to questions.
  • would become more of a knowledge base system using NLP to feed into other systems.
  • There would also be a public facing chatbot which would hopefully learn off of all of this.  Planning a system POC using AIML to test the waters
  • This would all somehow be also brought together to add some usefulness to my future robotics projects (image classification, knowledge base, etc)

I brought some odd bits of hardware, upgraded the server, brought some domains, and started rewriting in Java.  I decided I want to try and use a common language and randomly Java seemed the best fit (client, server, mobile etc)

It’s gonna be a slow progress but I think its gonna be exciting.