Mystery Incorporated


Programming, photography, news, culture, and cartoons

home | rss | atom

Add to Google

Looking Back at 2013 CaBi Data


January 20th, 2014 [programming]

Want to look back at 2013 using Capital Bikeshare data? I’ve put together an interactive tool to examine the 2013 daily ridership statistics for Capital Bikeshare. The data looks at daily “bikeout” totals, that is, how many bikes were checked out each day. You can summarize the data into weeks, months, quarters, and days of the week. The weekly view ignores December 31, in order to avoid having a 53rd week with only a single day in it.

You can compare the difference between bikeouts from subscribers (those with memberships for a month or a year) and casual riders (those with memberships for 1 or 3 days). You can also look at bikeout stats for any of the 306 individuals stations.

The program lets you find the correlation between any two data sets. You can use a second data set to color the bars of the first data set. The correlation is automatically calculated. It ranges from 1, a perfect positive correlation, to -1, a perfect negative correlation. 0 means there is no correlation.

In order to compare CaBi usage to general biking, I added a data set from automatic bike counters in Arlington. I combined data from 13 of their counters, which measure bikes in both directions. (They distinguish pedestrian traffic, which I did not include.) You can see this data set in action via the Activity Mapper.

My theory was that CaBi’s usage flucuations would resemble local general biking patterns more than bikesharing in other cities. So, I added data sets for bikesharing in London (Barclays Cycle Hire, via the London Datastore) and Minneapolis/St Paul (Nice Ride, which released their 2013 data). Note that Nice Ride shuts down for the winter.

We already know that bikeshare usage is highly seasonal; high in summer and low in winter, but I was wondering what factors contribute to riders’ behavior. So I added a few environmental data sets for Washington, DC: temperature, precipitation (via wunderground.com), and daylight hours (via the Astronomical Information Center).

Correlation with CaBi bikeouts (all users; all stations)
temperature precipitation daylight hours
daily 0.80 -0.31 0.61
weekly 0.90 -0.13 0.70
monthly 0.96 0.13 0.75
quarterly 0.99 0.16 0.80

The most positive correlation for CaBi is with the temperature; the higher the ridership, the higher the temperature. Obviously this is not proof of causation, but it makes sense that people are more likely to go biking when it is warm out. There is also a strong correlation with daylight hours.

Why is the biggest change in correlation when we jump from daily totals to weekly totals? I’d say it’s because CaBi ridership patterns change between weekdays and weekends, and the weather of course does not care about these things. So when you total them up for a weekly summary, these variations (which were present in only the CaBi data) no longer matter.

When compared on a daily scale, precipitation (rain and snow) has a negative correlation, meaning the more precipitation, the less the ridership. -.31 is a weak (but not insignficant) correlation. Why is that? When you look at the distribution of bikeouts and see an abrupt dip, it’s usually on a day that had significant rainfall. It think the correlation would be stronger if I measured data for each hour. Few rainstorms last all day long. If the rain occurs at night when folks are sleeping, it won’t have much of an impact on their biking behavior. I find it interesting that precipitation has a stronger correlation with subscribers (-0.29) than with casual riders (-0.20). I’m going to guess that subscribers are more likely to not use CaBi if the forecast looks doubtful, whereas casual users are more concerned with whether it’s raining right now.

The CaBi usage is more correlated with Arlington bike traffic levels (0.78 correlation) than with London’s bikeshare system (0.47 correlation) or Nice Ride in the Twin Cities (0.63 correlation), which makes sense. That is, as long as you look at daily patterns.

Correlation with CaBi bikeouts (all users; all stations)
Arlington bike counts Barclays hires Nice Ride bikeouts
daily 0.78 0.47 0.63
weekly 0.79 0.69 0.78
monthly 0.84 0.82 0.87
quarterly 0.88 0.93 0.95

I can understand why the correlation would increase as you compare longer time frames, since that averages out special events and weather patterns, but I am surprised to see London and the Twin Cities resemble CaBi more than Arlington cyclists do when viewed on a quarterly basis. But then, using only four data points doesn’t make for the best analysis, I suppose.

I was also surprised to see a stronger correlation for Arlington with casual CaBi users (0.76) than CaBi subscribers (0.57).

The program was written in Javascript, using the D3 library to render the animated bar charts. I am slowly trying to get used to D3’s method-chaining syntax (see Getting Started with D3 in JavaScript). Once I figured out how to string together D3 data descriptions, then I got to concentrate on gathering the data and building the UI.

The CaBi data came from four quarterly sets from Capital Bikeshare’s System Data page. I wrote a Java program to convert the data into daily totals.

There are many more features and data sets that could be added to the 2013 daily ridership statistics for Capital Bikeshare app. Which ones would make the tool more useful for you?

July 2014 Update: I added a new set of daily stats, CaBi imbalances. The number is calculated by adding the imbalances of each station, where a station’s imbalance is the absolute value of the difference between bikeouts and bikeins at the end of each day.

Tags: ,

2 Responses to “Looking Back at 2013 CaBi Data”

Site by M.V. Jantzen
mvs202 "at" gmail.com
twitter.com/mvs202