Artificially Reducing Intelligence (Pt. 1)

Posted on Sun 06 June 2021 in python

Carrying on with the fermentation theme, my long term goal is to use machine learning to design an (possibly) awesome beer recipe. However, in order to train a model, we need some data linking to recipes to consumer opinions. While there are lots of online resources for opinions, commercial beer recipes are often guarded secrets, and it's especially rare that they are shared following a structured data format. One (sort of) exception to this rule is the BrewDog DIY Dog. The catalogue of all BrewDog recipes is published anually in pdf format allowing homebrewers to have a go themselves. While recipes are largely written in a consistent way, some challenges existed in parsing the data to a machine readable format - and a couple of different libraries (PyPDF2 and tabula-py) were required.


Continue reading

Webscraping Metacritic

Posted on Wed 27 February 2019 in python

In looking for interesting data sets to play with I'm often searching through Kaggle thinking 'this looks awesome, but if only it had this attribute' or 'data covering a different time period'. Most recently, this happened when looking through the 'Kaggle data set for metacritic'. The natural next step was to start looking at scraping my own data, which is the process I want to cover in this post.

Inevitably the html syntax for metacritic has changed from the process documented in the Kaggle kernel, so a reworking of the html parsing has been required, which also makes this an excellent excuse to get some practise in natural languages and parsing, thanks Chomsky!


Continue reading