Having transit data available allows us to play and experiment. I wanted to take WMATA‘s Metrorail data to animate a day in the life of Washington, DC’s rail system. I chose Google Earth to display the data.
The challenge was to manipulate WMATA’s data into a format that’s accepted by Google Earth, and accurate and interesting. The WMATA data is posted according to the General Transit Feed Specification (GTFS, née Google Transit Feed Specification), which is a comma-separated values format (CSV). Meanwhile, Google Earth needs data in the KML format (Keyhole Markup Language). KML is a type of XML, meaning that it’s a hierarchical listing of tags describing the content.
For my first KML project, I wanted three types of map-based objects:
- rail lines
<Placemark> <name>Dupont Circle</name> <Point> <coordinates>-77.044428, 38.911129, 0</coordinates> </Point> </Placemark>
Rail lines are drawn with the LineString tag, like below:
<Placemark> <name>Yellow Line</name> <LineString> <extrude>1</extrude> <tessellate>1</tessellate> <coordinates>-77.074955, 38.793895, 0 -77.070831, 38.800345, 0 -77.060796, 38.806506, 0 etc </coordinates> </LineString> </Placemark>
The fun comes with the trains, since they’re the ones moving. Google added a custom tag to KML for moving objects, called gx:Track. (I’m guessing the gx prefix stands for Google extension.) It’s designed to work with any GPS recording device. It just needs a list of times, followed by a list of coordinates (longitude, latitude, and elevation). A sample track for a train’s schedule would look like:
<Placemark> <name>West Falls Church to Vienna (05:10:00)</name> <gx:Track> <when>2012-02-22T05:10:00Z</when> <when>2012-02-22T05:14:06Z</when> <when>2012-02-22T05:18:00Z</when> <gx:coord>-77.188871 38.900738 0</gx:coord> <gx:coord>-77.228654 38.883146 0</gx:coord> <gx:coord>-77.271363 38.877881 0</gx:coord> </gx:Track> </Placemark>
There are 5 files in the GTFS data set that I need to use to build the KML:
- routes.txt (needed just for the route_id codes for the 5 rail lines)
- trips.txt (each departure)
- stops.txt (stations)
- calendar_dates.txt (or calendar.txt, if it exists)
To make the project manageable, I started by getting rid of the Metrobus and Circulator data (see Filtering Metrorail Data from WMATA GTFS). Then I had to select a sample day (see Finding WMATA GTFS Data for a Specific Day), which made stop_times even smaller. I wrote a PHP program to convert the CSV files into KML.
When Google Earth loads a KML file with time data, it displays a time slider. Clicking the play button will animate the trains. When viewing the entire day of transit service, it looks like a jumble, because the trains move so fast you can’t tell if they’re coming or going. The time slider has buttons to let you zoom into a smaller time range, which would slow down the motion, but the zoom buttons are greyed out for me, and not clickable. The wrench icon opens up the time options dialogue, where you can adjust the normal speed, but not enough to make a day go by slowly enough to follow the trains.
Now, a word about accuracy. The animation looks plausible, but I also wanted to count the trips to see what I found. This list shows all the different trips offered, by starting point to end point, and counts the number of trips made on a typical Monday.
There are a few things that make me question the data’s integrity. Why are there 122 trains leaving from Largo, but only 116 returning? I do sometimes see empty trains traveling on the tracks; not sure why Metro would need to move trains around without picking up passengers. I also found it odd that there 16 trips from West Falls Church to Vienna, only two stations to the west; perhaps that is just feeding the system from the rail yard at Falls Church. Stranger yet, there are 25 trips from Vienna that end in West Falls Church. Why would a train stop there, rather than the opposite end of the Orange Line? Because the data’s visualization looks mostly accurate, I’m going to keep it as is, but will re-load the data from WMATA later in the spring to see if it has changed. (Please comment if you think you know what is wrong.)
My KML samples are below, to be viewed in Google Earth. They are derived from GTFS data downloaded from WMATA’s site on February 24, 2012. The data was filtered for Metrorail trips on February 27, 2012. The KML was zipped into a KMZ file, including the icon images for trains and stations.