With over half a million geotagged photos on Flickr in the Washington, DC region, we can use that data to explore our community. For this post, I am going to conduct text searches to see if certain words have a geographic pattern in their usage. Text searches in Flickr’s API are matched against the photo’s title, description, and tags. I’m using a map of 560-by-320 pixels, or 20.64-by-11.81 miles at this zoom level (244 square miles).

To start with, I picked terms that should clearly map to certain areas: “Virginia,” “Maryland”, and “District of Columbia.” Virginia returned 23,840 photos, shown in blue. Maryland returned 17,477 photos, shown in red. District of Columbia returned 32,386 photos, shown in green. The heat map uses the natural logarithm of the totals, in order to prevent the regions from being dominated by the high-intensity spikes.

I then zoomed out to put the USA, Canada, and Mexico on the map. I searched for “ocean,” “forest,” and “mountain” to see if those words matched the physical topography. Ocean returned 211,822 photos, shown in blue. Forest returned 210,899, shown in green. Mountain returned 383,520, shown in red.

Photos with the words Ocean, Forest, and Mountain do indeed occur where I would expect. Now that I have visual confirmation that my methodology is valid, it’s time to experiment with search terms that aren’t topographical. In the map below I show a two-tone heat map based on searching for Flickr’s geotagged photos that contain the word “cowboy.” 25,804 were found:

Other than a clear spike in cowboy photos in Dallas, Texas, it’s hard to draw conclusions from this map because I don’t know if the dense regions are due to a local fascination with cowboys, or maybe that region just takes a lot of photos in general. I need to normalize the data in order to find where spikes occur relative to the total number of photos. To do that, I need the numbers for all photos across the same grid. All 29,464,097 photos are shown below.

To normalize the data, I measured the ratio of “cowboy” photos to that square’s total photos. That ratio produces the following map:

I changed the two-tone split from 2% to 4%, and arbitrarily changed the color. In this view, the most cowboyish part of North America is in the center north portion of Montana, where 8% of the 297 geotagged Flickr photos contain the word cowboy. Close behind is the square over Interstate 40 by the Oklahoma/Texas border, where 7.5% of the 1,378 photos contain the word cowboy. In the region around Dallas, only 1% of the 254,878 photos contain the word cowboy.

Geotagging photos is still in its infancy. Though many cell phones have GPS built-in, it’s still a rarity for real cameras. As more geotagged photos are made public, researchers will have a new tool available.

previous: Custom Flickr Maps for Users and Groups

Projecting Word Frequency onto Maps

Leave a Reply