Webscraping Metacritic

Posted on Wed 27 February 2019 in python

In looking for interesting data sets to play with I'm often searching through Kaggle thinking 'this looks awesome, but if only it had this attribute' or 'data covering a different time period'. Most recently, this happened when looking through the 'Kaggle data set for metacritic'. The natural next step was to start looking at scraping my own data, which is the process I want to cover in this post.

Inevitably the html syntax for metacritic has changed from the process documented in the Kaggle kernel, so a reworking of the html parsing has been required, which also makes this an excellent excuse to get some practise in natural languages and parsing, thanks Chomsky!


Continue reading