Mapping Vancouver's Buses
I came across @realtimebus the other day and immediately wanted to recreate it for Vancouver. Luckily Translink has a realtime bus API that I could pull from and making an animated gif is pretty straightforward in Python. And since I already have some experience making twitter bots with @VanBikeShareBot I felt like I didn't really have a choice.
First I needed to go to the Translink developers website to get an API key, and put it in a file called credentials.py
like so:
translink_api_key = 'myapikeystring'
This allows me to import the key as a variable without making it public.
from credentials import translink_api_key
The next step is to write a function that queries the API and dumps the bus information to a CSV file. My strategy here is to read the XML from the API call and convert it to a Pandas dataframe which gets saved as a CSV. The first call creates the CSV file and subsequent queries append to the end of the file.
import urllib.request
import xml.etree.ElementTree as ET
import pandas as pd
import datetime
import time
def query():
try:
df = pd.read_csv('buses.csv')
except:
df = pd.DataFrame()
# Query Translink API
u = f"http://api.translink.ca/rttiapi/v1/buses?apikey={translink_api_key}"
attnames = ['VehicleNo','TripId','RouteNo',
'Direction','Pattern','RouteMap','Latitude','Longitude','RecordedTime','Href']
with urllib.request.urlopen(u) as url:
data = url.read().decode()
buses = ET.fromstring(data)
ndf = pd.DataFrame()
for bus in buses:
atts = [att.text for att in bus]
atts = {name:att for name,att in zip(attnames,atts)}
ndf = ndf.append(atts,ignore_index=True)
ndf.RecordedTime = pd.to_datetime(ndf.RecordedTime)
ndf.Latitude = ndf.Latitude.astype(float)
ndf.Longitude = ndf.Longitude.astype(float)
ndf = ndf[ndf.Latitude!=0] # Drop lats/long == 0
df = df.append(ndf)
df = df.drop_duplicates()
df.to_csv('buses.csv',index=False)
For my project I run this function once a minute then make a GIF every hour. For this exercice we just need to run it a few times to get some data. Let's query the API 5 times before moving on.
for i in range(5):
print('query')
query()
time.sleep(60)
Now we've built up some data, but it's in a fairly messy form. We need to transform the dataframe into something we can cycle through to make an animations.
df = pd.read_csv('buses.csv')
df.head()
My goal now is to convert the dataframe such that each row is a timepoint and each column is an individual bus, with the values being the location coordinates. For simplicity I'm also going to break out latitude and longitude into separate dataframes.
latdf = pd.pivot_table(df,index='RecordedTime',values='Latitude',columns='VehicleNo',aggfunc='first')
longdf = pd.pivot_table(df,index='RecordedTime',values='Longitude',columns='VehicleNo',aggfunc='first')
latdf.head()
latdf[1503].dropna()
The dataframe looks empty but it's really just sparse. Each bus has a different "RecordedTime" value, the last time which it reported its location to Translink's servers. What I want to do it first reindex the dataframe so that the index is continuous through the time frame we're looking at, then interpolate the coordinates for each bus along each column. The interpolation needs to be run on floats not tuples which is why I separated out the lats and longs, but I'll recombine them into coordsdf
once the interpolation is done.
df.RecordedTime = pd.to_datetime(df.RecordedTime)
idx = pd.date_range(df.RecordedTime.min(),df.RecordedTime.max(),freq='s')
latdf = latdf.reindex(idx)
longdf = longdf.reindex(idx)
latdf_interp = latdf.interpolate(method='time',limit_direction='both')
longdf_interp = longdf.interpolate(method='time',limit_direction='both')
latdf_interp.head()
coordsdf = latdf_interp.combine(longdf_interp,lambda x,y: tuple(zip(y,x)))
# We don't actually need a data point for each second to make a smooth
# GIF, so let's only keep every 10th row.
coordsdf = coordsdf.iloc[::10]
coordsdf.head()
This is something I can work with! I'll start by drawing a static scatter plot of the first row of our dat
import matplotlib.pyplot as plt
import matplotlib.animation as animation
%matplotlib notebook
f,ax = plt.subplots()
ax.set_facecolor('k')
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)
f.subplots_adjust(left=0, bottom=0, right=1, top=1, wspace=None, hspace=None)
longs = coordsdf.iloc[0].map(lambda x: x[0])
lats = coordsdf.iloc[0].map(lambda x: x[1])
longmin = -123.3206 # Horseshoe bay
longmax = -122.5374 # Langley-ish
latmin = 49.0 # US Border/Ferry terminal
latmax = 49.479576 # Lions Bay
ax.set_xlim(longmin,longmax)
ax.set_ylim(latmin,latmax)
scatter = ax.scatter(longs,lats,s=1,cmap='cool',c=range(len(coordsdf.columns)))
text = ax.text(0.9,0.9,str(coordsdf.index[0])[:16],size=15,
color='white',alpha=0.4,transform=ax.transAxes,horizontalalignment='right')
Finally to turn this into an animation I have to define a function that will be run in sequence to create each frame. I just need to pull the latitudes and longitudes from the next row in our dataframe, update the scatter plot and update the timestamp text.
To create the animation FuncAnimation
just needs the original figure handle, the update function, the number of frames and the time interval between frames. We want one frame for each row in the dataframe, and I found that a 50 ms frame rate gives a nice smooth appearance, but for this short demonstration I'll slow it down to 100 ms.
def run(i):
longs = coordsdf.iloc[i].map(lambda x: x[0])
lats = coordsdf.iloc[i].map(lambda x: x[1])
scatter.set_offsets(list(zip(longs,lats)))
text.set_text(str(coordsdf.index[i])[:16])
frames=len(coordsdf)
ani = animation.FuncAnimation(f,run,frames=frames, interval=100)
ani.save('buses_animation.gif',writer='imagemagick')
You might notice that most of the buses stay still for the first few seconds. This is because a few buses are more delayed in updating their position, so they have an earlier RecordedTime
value than the other buses. The interpolation function keeps buses in the same spot until they start having measured coordinates.
It's up to you whether you want to create a GIF or an mp4 video file. If you prefer mp4, you might want to use the ffmpeg
writer instead of imagemagick
. Videos have the advantage of smaller file sizes, but I'm having some trouble uploading mp4s through the twitter API so I'm sticking with GIFs for now.
The final code for my twitter bot available on Github. It's organized slightly differently to make it easier to run as cron jobs, but the guts are exactly the same as what's described here. If you have any questions about this project, don't hesitate to get in touch via twitter or email.
Comments
Comments powered by Disqus