Visualising Geospatial Data
Posted on Fri 08 February 2019 in python
Probably the main reason I started studying Geology was for the opportunities that it gives you to travel. In the first instance, I thought spending two weeks in the Lake District would be a good break while studying maths and computing at college. It turned out this break would end up being 17 years, although as lovely as they are, thankfully that entire time wasn't spent in the lakes. Needless to say, I love looking at and making maps, so exploring georeferenced data is a natural progression as I'm learning about data science.
After being insprired by the excellent 'The Information Capital', I was keen to get my hands on some data and try and make my own map. In looking around for data, I remembered that thanks to OCD I kept a digital travel diary within Google Earth. Aha, a data set! So this post will explore feature creation and visualisation of my own global (/carbon) footprint.
This is how the data looks within the Google Earth API, so let's see if we can create something better..
Installing packages and dependancies¶
I deciding to go down the geopandas and geoplot route after being frustrated by the licensing requirments for using Google Maps. While this proved to add a whole bunch of options for customisation not possible though Google Maps, I found installing these packages on Windows incredibly difficult due to dependacy issues. If you're in the same boat and using Windows, I would definitely recommend saving yourself days of frustration and following this post by the wonderful Programming Adventures.
Data¶
After converting the kmz archive to csv, the data consists of a series of long/lat positions. These need to be converted to geopandas dataframes in order to plot them with geoplot:
%matplotlib inline
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point, LineString
import matplotlib.pyplot as plt
import geoplot as gplt
import geoplot.crs as gcrs
import cartopy
hotels = pd.read_csv('data/travel.csv', encoding='latin-1', index_col=False)
hotels['Coordinates'] = list(zip(hotels.Long, hotels.Lat))
hotels['Coordinates'] = hotels['Coordinates'].apply(Point)
hotels = gpd.GeoDataFrame(hotels, geometry='Coordinates')
hotels.head(3)
Following some inspiration looking at the geoplot examples including this plot of flight data, I wanted to create another dataframe containing my personal flight history. After wracking my brain during a long train journey, I had a list of origins and destinations references in the form of IATA airport codes. In order to convert these to long/lat coordinates, I picked up an open source database from the US Bureau of Transportation Statistics and then created another geopandas dataframe:
flights = pd.read_csv('data/flights.csv')
airports = pd.read_csv('data/airports.csv', index_col=False)
airports = airports.groupby(['AIRPORT']).first().reset_index()
flights_origin = flights.merge(airports[['AIRPORT','LONGITUDE','LATITUDE']], how='left', left_on='Origin',
right_on='AIRPORT').sort_values(by=['ID'])
flights_dest = flights.merge(airports[['AIRPORT','LONGITUDE','LATITUDE']], how='left', left_on='Destination',
right_on='AIRPORT').sort_values(by=['ID'])
flights['Long_Origin'] = flights_origin['LONGITUDE']
flights['Lat_Origin'] = flights_origin['LATITUDE']
flights['Long_Destination'] = flights_dest['LONGITUDE']
flights['Lat_Destination'] = flights_dest['LATITUDE']
flights['XY_Origin'] = flights[['Long_Origin', 'Lat_Origin']].apply(tuple, axis=1).apply(Point)
flights['XY_Destination'] = flights[['Long_Destination', 'Lat_Destination']].apply(tuple, axis=1).apply(Point)
flights['XY_Line'] = flights[['XY_Origin', 'XY_Destination']].apply(list, axis=1).apply(LineString)
flights = gpd.GeoDataFrame(flights, geometry='XY_Line')
flights.head(3)
Plotting¶
The two data frames are then plotted on a projection of the earth. I really wanted to create something different from Google Maps, so without the concept of shame, went for a long exposure photography colour scheme.
fig = plt.figure(figsize=(18, 8), facecolor='black')
ax1 = fig.add_subplot(111, projection=gcrs.PlateCarree())
ax1.set_global()
ax1.add_feature(cartopy.feature.BORDERS, linewidth=0.5, edgecolor='grey')
ax1.add_feature(cartopy.feature.COASTLINE, linewidth=0.5, edgecolor='grey')
gplt.sankey(flights, start='XY_Origin', end='XY_Destination', projection=gcrs.PlateCarree(),
color='#fefede', linewidth=2, alpha=0.3, ax=ax1)
gplt.pointplot(hotels, projection=gcrs.PlateCarree(), marker='o', color='#fefede', alpha=0.6, ax=ax1)
Next steps¶
While I really enjoyed learning how to use geoplot and was happy with how the visualisation turned out. The potential to create beautiful visualisations that aren't only of interest to myself is really exciting. Just to give an idea of the potential of the packages, here are a couple of examples showing different projects and the various types of open source data that can be plotted. I'm wondering how I can integrate this in to a real geological project next..
fig = plt.figure(num=1, figsize=(18, 7))
ax1 = fig.add_subplot(121, projection=gcrs.Orthographic(central_longitude=30, central_latitude=20))
ax2 = fig.add_subplot(122, projection=gcrs.Orthographic(central_longitude=-50, central_latitude=20))
#http//www.naturalearthdata.com/download/110m/physical/ne_10m_geography_regions_polys.zip
basin = cartopy.feature.NaturalEarthFeature(
scale='10m',
category='physical',
name='geography_regions_polys',
facecolor='gby',
alpha=0.2)
ax1.set_global()
ax1.add_feature(cartopy.feature.OCEAN, color='#aadaff')
ax1.add_feature(basin)
ax1.add_feature(cartopy.feature.COASTLINE, linewidth=0.5, edgecolor='darkgrey')
ax2.set_global()
ax2.stock_img()
ax2.add_feature(cartopy.feature.BORDERS, linewidth=1, edgecolor='darkgrey')
ax2.add_feature(cartopy.feature.COASTLINE, linewidth=1, edgecolor='darkgrey')