Visualising Geospatial Data

Posted on Fri 08 February 2019 in python

Probably the main reason I started studying Geology was for the opportunities that it gives you to travel. In the first instance, I thought spending two weeks in the Lake District would be a good break while studying maths and computing at college. It turned out this break would end up being 17 years, although as lovely as they are, thankfully that entire time wasn't spent in the lakes. Needless to say, I love looking at and making maps, so exploring georeferenced data is a natural progression as I'm learning about data science.

After being insprired by the excellent 'The Information Capital', I was keen to get my hands on some data and try and make my own map. In looking around for data, I remembered that thanks to OCD I kept a digital travel diary within Google Earth. Aha, a data set! So this post will explore feature creation and visualisation of my own global (/carbon) footprint.

This is how the data looks within the Google Earth API, so let's see if we can create something better..

Google Maps

Installing packages and dependancies

I deciding to go down the geopandas and geoplot route after being frustrated by the licensing requirments for using Google Maps. While this proved to add a whole bunch of options for customisation not possible though Google Maps, I found installing these packages on Windows incredibly difficult due to dependacy issues. If you're in the same boat and using Windows, I would definitely recommend saving yourself days of frustration and following this post by the wonderful Programming Adventures.

Data

After converting the kmz archive to csv, the data consists of a series of long/lat positions. These need to be converted to geopandas dataframes in order to plot them with geoplot:

In [3]:
%matplotlib inline

import pandas as pd
import geopandas as gpd
from shapely.geometry import Point, LineString
import matplotlib.pyplot as plt
import geoplot as gplt
import geoplot.crs as gcrs
import cartopy

hotels = pd.read_csv('data/travel.csv', encoding='latin-1', index_col=False)
hotels['Coordinates'] = list(zip(hotels.Long, hotels.Lat))
hotels['Coordinates'] = hotels['Coordinates'].apply(Point)
hotels = gpd.GeoDataFrame(hotels, geometry='Coordinates')
hotels.head(3)
Out[3]:
Long Lat Coordinates
0 24.606567 35.389321 POINT (24.60656653 35.38932092)
1 5.457758 60.372815 POINT (5.457757622999999 60.37281479999999)
2 12.482761 43.579608 POINT (12.48276071 43.57960803)

Following some inspiration looking at the geoplot examples including this plot of flight data, I wanted to create another dataframe containing my personal flight history. After wracking my brain during a long train journey, I had a list of origins and destinations references in the form of IATA airport codes. In order to convert these to long/lat coordinates, I picked up an open source database from the US Bureau of Transportation Statistics and then created another geopandas dataframe:

In [4]:
flights = pd.read_csv('data/flights.csv')
airports = pd.read_csv('data/airports.csv', index_col=False)
airports = airports.groupby(['AIRPORT']).first().reset_index()
flights_origin = flights.merge(airports[['AIRPORT','LONGITUDE','LATITUDE']], how='left', left_on='Origin',
                right_on='AIRPORT').sort_values(by=['ID'])
flights_dest = flights.merge(airports[['AIRPORT','LONGITUDE','LATITUDE']], how='left', left_on='Destination',
                right_on='AIRPORT').sort_values(by=['ID'])
flights['Long_Origin'] = flights_origin['LONGITUDE']
flights['Lat_Origin'] = flights_origin['LATITUDE']
flights['Long_Destination'] = flights_dest['LONGITUDE']
flights['Lat_Destination'] = flights_dest['LATITUDE']
flights['XY_Origin'] = flights[['Long_Origin', 'Lat_Origin']].apply(tuple, axis=1).apply(Point)
flights['XY_Destination'] = flights[['Long_Destination', 'Lat_Destination']].apply(tuple, axis=1).apply(Point)
flights['XY_Line'] = flights[['XY_Origin', 'XY_Destination']].apply(list, axis=1).apply(LineString)
flights = gpd.GeoDataFrame(flights, geometry='XY_Line')
flights.head(3)
Out[4]:
ID Origin Destination Long_Origin Lat_Origin Long_Destination Lat_Destination XY_Origin XY_Destination XY_Line
0 0 BHX LPA -1.746389 52.453056 -15.383333 27.931944 POINT (-1.74638889 52.45305556) POINT (-15.38333333 27.93194444) LINESTRING (-1.74638889 52.45305556, -15.38333...
1 1 LPA BHX -15.383333 27.931944 -1.746389 52.453056 POINT (-15.38333333 27.93194444) POINT (-1.74638889 52.45305556) LINESTRING (-15.38333333 27.93194444, -1.74638...
2 2 BHX TFS -1.746389 52.453056 -16.570556 28.042778 POINT (-1.74638889 52.45305556) POINT (-16.57055556 28.04277778) LINESTRING (-1.74638889 52.45305556, -16.57055...

Plotting

The two data frames are then plotted on a projection of the earth. I really wanted to create something different from Google Maps, so without the concept of shame, went for a long exposure photography colour scheme.

In [26]:
fig = plt.figure(figsize=(18, 8), facecolor='black')
ax1 = fig.add_subplot(111, projection=gcrs.PlateCarree())

ax1.set_global()
ax1.add_feature(cartopy.feature.BORDERS, linewidth=0.5, edgecolor='grey')
ax1.add_feature(cartopy.feature.COASTLINE, linewidth=0.5, edgecolor='grey')

gplt.sankey(flights, start='XY_Origin', end='XY_Destination', projection=gcrs.PlateCarree(),
            color='#fefede', linewidth=2, alpha=0.3, ax=ax1)
gplt.pointplot(hotels, projection=gcrs.PlateCarree(), marker='o', color='#fefede', alpha=0.6, ax=ax1)
Out[26]:
<cartopy.mpl.geoaxes.GeoAxesSubplot at 0x1d8862e1860>

Next steps

While I really enjoyed learning how to use geoplot and was happy with how the visualisation turned out. The potential to create beautiful visualisations that aren't only of interest to myself is really exciting. Just to give an idea of the potential of the packages, here are a couple of examples showing different projects and the various types of open source data that can be plotted. I'm wondering how I can integrate this in to a real geological project next..

In [27]:
fig = plt.figure(num=1, figsize=(18, 7))
ax1 = fig.add_subplot(121, projection=gcrs.Orthographic(central_longitude=30, central_latitude=20))
ax2 = fig.add_subplot(122, projection=gcrs.Orthographic(central_longitude=-50, central_latitude=20))

#http//www.naturalearthdata.com/download/110m/physical/ne_10m_geography_regions_polys.zip
basin = cartopy.feature.NaturalEarthFeature(
        scale='10m',
        category='physical',
        name='geography_regions_polys',
        facecolor='gby',
        alpha=0.2)

ax1.set_global()
ax1.add_feature(cartopy.feature.OCEAN, color='#aadaff')
ax1.add_feature(basin)
ax1.add_feature(cartopy.feature.COASTLINE, linewidth=0.5, edgecolor='darkgrey')

ax2.set_global()
ax2.stock_img()
ax2.add_feature(cartopy.feature.BORDERS, linewidth=1, edgecolor='darkgrey')
ax2.add_feature(cartopy.feature.COASTLINE, linewidth=1, edgecolor='darkgrey')
Out[27]:
<cartopy.mpl.feature_artist.FeatureArtist at 0x1d8865c34e0>