Mapping Vancouver's Buses

I came across @realtimebus the other day and immediately wanted to recreate it for Vancouver. Luckily Translink has a realtime bus API that I could pull from and making an animated gif is pretty straightforward in Python. And since I already have some experience making twitter bots with @VanBikeShareBot I felt like I didn't really have a choice.

First I needed to go to the Translink developers website to get an API key, and put it in a file called credentials.py like so:

translink_api_key = 'myapikeystring'

This allows me to import the key as a variable without making it public.

In [1]:
from credentials import translink_api_key

The next step is to write a function that queries the API and dumps the bus information to a CSV file. My strategy here is to read the XML from the API call and convert it to a Pandas dataframe which gets saved as a CSV. The first call creates the CSV file and subsequent queries append to the end of the file.

In [2]:
import urllib.request
import xml.etree.ElementTree as ET
import pandas as pd
import datetime
import time
In [3]:
def query():

    try:
        df = pd.read_csv('buses.csv')
    except:
        df = pd.DataFrame()

    # Query Translink API
    u = f"http://api.translink.ca/rttiapi/v1/buses?apikey={translink_api_key}"

    attnames = ['VehicleNo','TripId','RouteNo',
                'Direction','Pattern','RouteMap','Latitude','Longitude','RecordedTime','Href']

    with urllib.request.urlopen(u) as url:
        data = url.read().decode()
        buses = ET.fromstring(data)
        ndf = pd.DataFrame()
        for bus in buses:
            atts = [att.text for att in bus]
            atts = {name:att for name,att in zip(attnames,atts)}
            ndf = ndf.append(atts,ignore_index=True)


        ndf.RecordedTime = pd.to_datetime(ndf.RecordedTime)
        ndf.Latitude = ndf.Latitude.astype(float)
        ndf.Longitude = ndf.Longitude.astype(float)
        ndf = ndf[ndf.Latitude!=0]  # Drop lats/long == 0

    df = df.append(ndf)
    df = df.drop_duplicates()

    df.to_csv('buses.csv',index=False)

For my project I run this function once a minute then make a GIF every hour. For this exercice we just need to run it a few times to get some data. Let's query the API 5 times before moving on.

In [4]:
for i in range(5):
    print('query')
    query()
    time.sleep(60)
query
query
query
query
query

Now we've built up some data, but it's in a fairly messy form. We need to transform the dataframe into something we can cycle through to make an animations.

In [5]:
df = pd.read_csv('buses.csv')
df.head()
Out[5]:
Direction Href Latitude Longitude Pattern RecordedTime RouteMap RouteNo TripId VehicleNo
0 WEST NaN 49.234800 -123.186233 UBC 2019-02-16 21:31:27 WB1 49 10210880 12001
1 WEST NaN 49.224150 -122.999967 UBC 2019-02-16 21:31:21 WB1 49 10210889 12003
2 WEST NaN 49.225383 -123.083500 UBC 2019-02-16 21:31:27 WB1 49 10210851 12016
3 EAST NaN 49.265750 -122.778250 PT COQ STN 2019-02-16 21:31:22 EB1PC 160 10224133 14002
4 WEST NaN 49.279817 -122.794150 COQ CTRL STN 2019-02-16 21:30:23 WB2 188 10227563 14006

My goal now is to convert the dataframe such that each row is a timepoint and each column is an individual bus, with the values being the location coordinates. For simplicity I'm also going to break out latitude and longitude into separate dataframes.

In [28]:
latdf = pd.pivot_table(df,index='RecordedTime',values='Latitude',columns='VehicleNo',aggfunc='first')
longdf = pd.pivot_table(df,index='RecordedTime',values='Longitude',columns='VehicleNo',aggfunc='first')
In [8]:
latdf.head()
Out[8]:
VehicleNo 1503 1509 1510 1511 1515 1516 1517 1518 1519 1520 ... 18506 18508 18509 18524 80705 80903 80906 80909 81205 81210
RecordedTime
2019-02-16 21:19:06 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2019-02-16 21:20:12 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2019-02-16 21:21:53 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2019-02-16 21:22:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2019-02-16 21:22:49 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 411 columns

In [9]:
latdf[1503].dropna()
Out[9]:
RecordedTime
2019-02-16 21:31:14    49.261317
2019-02-16 21:31:50    49.260883
2019-02-16 21:33:15    49.263033
2019-02-16 21:34:26    49.265217
2019-02-16 21:35:17    49.265367
2019-02-16 21:46:30    49.275183
Name: 1503, dtype: float64

The dataframe looks empty but it's really just sparse. Each bus has a different "RecordedTime" value, the last time which it reported its location to Translink's servers. What I want to do it first reindex the dataframe so that the index is continuous through the time frame we're looking at, then interpolate the coordinates for each bus along each column. The interpolation needs to be run on floats not tuples which is why I separated out the lats and longs, but I'll recombine them into coordsdf once the interpolation is done.

In [29]:
df.RecordedTime = pd.to_datetime(df.RecordedTime)
idx = pd.date_range(df.RecordedTime.min(),df.RecordedTime.max(),freq='s')
In [36]:
latdf = latdf.reindex(idx)
longdf = longdf.reindex(idx)
In [37]:
latdf_interp = latdf.interpolate(method='time',limit_direction='both')
longdf_interp = longdf.interpolate(method='time',limit_direction='both')
In [38]:
latdf_interp.head()
Out[38]:
VehicleNo 1503 1509 1510 1511 1515 1516 1517 1518 1519 1520 ... 18506 18508 18509 18524 80705 80903 80906 80909 81205 81210
2019-02-16 21:19:06 49.261317 49.2798 49.206667 49.2011 49.214817 49.217083 49.21485 49.0363 49.224 49.259633 ... 49.32845 49.328433 49.374467 49.200333 49.34105 49.3367 49.374083 49.281467 49.327833 49.2867
2019-02-16 21:19:07 49.261317 49.2798 49.206667 49.2011 49.214817 49.217083 49.21485 49.0363 49.224 49.259633 ... 49.32845 49.328433 49.374467 49.200333 49.34105 49.3367 49.374083 49.281467 49.327833 49.2867
2019-02-16 21:19:08 49.261317 49.2798 49.206667 49.2011 49.214817 49.217083 49.21485 49.0363 49.224 49.259633 ... 49.32845 49.328433 49.374467 49.200333 49.34105 49.3367 49.374083 49.281467 49.327833 49.2867
2019-02-16 21:19:09 49.261317 49.2798 49.206667 49.2011 49.214817 49.217083 49.21485 49.0363 49.224 49.259633 ... 49.32845 49.328433 49.374467 49.200333 49.34105 49.3367 49.374083 49.281467 49.327833 49.2867
2019-02-16 21:19:10 49.261317 49.2798 49.206667 49.2011 49.214817 49.217083 49.21485 49.0363 49.224 49.259633 ... 49.32845 49.328433 49.374467 49.200333 49.34105 49.3367 49.374083 49.281467 49.327833 49.2867

5 rows × 411 columns

In [39]:
coordsdf = latdf_interp.combine(longdf_interp,lambda x,y: tuple(zip(y,x)))

# We don't actually need a data point for each second to make a smooth
# GIF, so let's only keep every 10th row.
coordsdf = coordsdf.iloc[::10]
In [40]:
coordsdf.head()
Out[40]:
VehicleNo 1503 1509 1510 1511 1515 1516 1517 1518 1519 1520 ... 18506 18508 18509 18524 80705 80903 80906 80909 81205 81210
2019-02-16 21:19:06 (-122.774217, 49.261317) (-123.137633, 49.2798) (-123.017883, 49.206667) (-122.91175, 49.2011) (-122.990033, 49.214817) (-122.921933, 49.217083) (-122.99095, 49.21485) (-123.068583, 49.0363) (-122.9997, 49.224) (-123.25515, 49.259633) ... (-123.15748300000001, 49.32845) (-122.99675, 49.328433) (-123.27725, 49.374467) (-122.9123, 49.200333) (-123.135633, 49.34105) (-123.189683, 49.3367) (-123.2729, 49.374083) (-123.113967, 49.281467) (-123.1538, 49.327833) (-123.12465, 49.2867)
2019-02-16 21:19:16 (-122.774217, 49.261317) (-123.137633, 49.2798) (-123.017883, 49.206667) (-122.91175, 49.2011) (-122.990033, 49.214817) (-122.921933, 49.217083) (-122.99095, 49.21485) (-123.068583, 49.0363) (-122.9997, 49.224) (-123.25515, 49.259633) ... (-123.15748300000001, 49.32845) (-122.99675, 49.328433) (-123.27725, 49.374467) (-122.9123, 49.200333) (-123.135633, 49.34105) (-123.189683, 49.3367) (-123.2729, 49.374083) (-123.113967, 49.281467) (-123.1538, 49.327833) (-123.12465, 49.2867)
2019-02-16 21:19:26 (-122.774217, 49.261317) (-123.137633, 49.2798) (-123.017883, 49.206667) (-122.91175, 49.2011) (-122.990033, 49.214817) (-122.921933, 49.217083) (-122.99095, 49.21485) (-123.068583, 49.0363) (-122.9997, 49.224) (-123.25515, 49.259633) ... (-123.15748300000001, 49.32845) (-122.99675, 49.328433) (-123.27725, 49.374467) (-122.9123, 49.200333) (-123.135633, 49.34105) (-123.189683, 49.3367) (-123.2729, 49.374083) (-123.113967, 49.281467) (-123.1538, 49.327833) (-123.12465, 49.2867)
2019-02-16 21:19:36 (-122.774217, 49.261317) (-123.137633, 49.2798) (-123.017883, 49.206667) (-122.91175, 49.2011) (-122.990033, 49.214817) (-122.921933, 49.217083) (-122.99095, 49.21485) (-123.068583, 49.0363) (-122.9997, 49.224) (-123.25515, 49.259633) ... (-123.15748300000001, 49.32845) (-122.99675, 49.328433) (-123.27725, 49.374467) (-122.9123, 49.200333) (-123.135633, 49.34105) (-123.189683, 49.3367) (-123.2729, 49.374083) (-123.113967, 49.281467) (-123.1538, 49.327833) (-123.12465, 49.2867)
2019-02-16 21:19:46 (-122.774217, 49.261317) (-123.137633, 49.2798) (-123.017883, 49.206667) (-122.91175, 49.2011) (-122.990033, 49.214817) (-122.921933, 49.217083) (-122.99095, 49.21485) (-123.068583, 49.0363) (-122.9997, 49.224) (-123.25515, 49.259633) ... (-123.15748300000001, 49.32845) (-122.99675, 49.328433) (-123.27725, 49.374467) (-122.9123, 49.200333) (-123.135633, 49.34105) (-123.189683, 49.3367) (-123.2729, 49.374083) (-123.113967, 49.281467) (-123.1538, 49.327833) (-123.12465, 49.2867)

5 rows × 411 columns

This is something I can work with! I'll start by drawing a static scatter plot of the first row of our dat

In [42]:
import matplotlib.pyplot as plt
import matplotlib.animation as animation
%matplotlib notebook

f,ax = plt.subplots()
ax.set_facecolor('k')
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)
f.subplots_adjust(left=0, bottom=0, right=1, top=1, wspace=None, hspace=None)
In [43]:
longs = coordsdf.iloc[0].map(lambda x: x[0])
lats =  coordsdf.iloc[0].map(lambda x: x[1])

longmin = -123.3206 # Horseshoe bay
longmax = -122.5374 # Langley-ish
latmin = 49.0       # US Border/Ferry terminal
latmax = 49.479576  # Lions Bay
ax.set_xlim(longmin,longmax)
ax.set_ylim(latmin,latmax)

scatter = ax.scatter(longs,lats,s=1,cmap='cool',c=range(len(coordsdf.columns)))
text = ax.text(0.9,0.9,str(coordsdf.index[0])[:16],size=15,
                         color='white',alpha=0.4,transform=ax.transAxes,horizontalalignment='right')

Finally to turn this into an animation I have to define a function that will be run in sequence to create each frame. I just need to pull the latitudes and longitudes from the next row in our dataframe, update the scatter plot and update the timestamp text.

To create the animation FuncAnimation just needs the original figure handle, the update function, the number of frames and the time interval between frames. We want one frame for each row in the dataframe, and I found that a 50 ms frame rate gives a nice smooth appearance, but for this short demonstration I'll slow it down to 100 ms.

In [44]:
def run(i):

    longs = coordsdf.iloc[i].map(lambda x: x[0])
    lats =  coordsdf.iloc[i].map(lambda x: x[1])

    scatter.set_offsets(list(zip(longs,lats)))
    text.set_text(str(coordsdf.index[i])[:16])
In [45]:
frames=len(coordsdf)
ani = animation.FuncAnimation(f,run,frames=frames, interval=100)
ani.save('buses_animation.gif',writer='imagemagick') 

You might notice that most of the buses stay still for the first few seconds. This is because a few buses are more delayed in updating their position, so they have an earlier RecordedTime value than the other buses. The interpolation function keeps buses in the same spot until they start having measured coordinates.

It's up to you whether you want to create a GIF or an mp4 video file. If you prefer mp4, you might want to use the ffmpeg writer instead of imagemagick. Videos have the advantage of smaller file sizes, but I'm having some trouble uploading mp4s through the twitter API so I'm sticking with GIFs for now.

The final code for my twitter bot available on Github. It's organized slightly differently to make it easier to run as cron jobs, but the guts are exactly the same as what's described here. If you have any questions about this project, don't hesitate to get in touch via twitter or email.

Comments

Comments powered by Disqus