Our unit in Intro DH right now is on mapping. In class we’ll be working on creating maps with Palladio. We also had a preliminary introduction to data, tables, and maps by experimenting with Google Fusion Tables. In preparation for class, I imported a data set consisting of a list of images from the Cushman Archive into a few different tools to experiment.
Here is the map of the data in a Google Fusion map:
This is Miriam Posner’s version of the data. She downloaded the data from the Cushman archives site, restricted the dates slightly, and cleaned it up. This data went straight into Google’s Fusion Tables as is. The map shows the locations of the objects photographed. One dot for every photograph. Locations are longitude-latitude geocoordinates.
Then I tried CartoDB. I’ve never used it before, but it’s fairly user friendly for anyone willing to spend some time just playing around and seeing what works and doesn’t work. The first thing I discovered was that CartoDB (unlike Fusion Tables) does not like geocoordinates in one field. In the Cushman dataset, the longitude and latitude were together in one field. But in CartoDB, longitude and latitude must be disaggregated. So to create the following map in CartoDB I first followed the instructions in their FAQ to create separate columns for longitude and latitude. Then I had fun playing with their map options.
This is just a plain map, but with the locations color coded by the primary genre of each photograph (direct link to CartoDB map):
This one shows the photographs over time (go to the direct link to CartoDB map, because on the embedded map below, the legend blocks the slider):
Then I decided I wanted to see if I could map based on states or cities (for example, summing the number of photographs in a certain state, and color-coding or sizing the dots on the map based on the number of photographs from that city or state). So I used the same process to disaggregate cities and states as I used to disaggregate longitude/latitude — I just changed the field names. I noted, though, that for some reason, trying to geo-code by the city led to some incorrect locations. If you zoom out in the map below, you’ll see that some of the photographs of objects in Atlanta, Georgia, have been placed in Central Asia, in Georgia and Armenia. This map represents many efforts to clean the data through automation — simply retelling CartoDB to geocode the cities or states. Didn’t work well.
I also couldn’t figure out a good way to visualize density — the number of photographs from each state, for example. So I downloaded my new dataset from CartoDB as a csv file and then imported it into Tableau (Desktop 9.0). By dragging and dropping the “state” field onto the workspace, I quickly created a map showing all the states where photographs in the collection had been taken:
Then I dragged and dropped Topical Subject Heading 1 (under the Dimensions list on the left in Tableau) onto my map, and I dragged and dropped the “Number of Records” Measure (under the Measures list on the left in Tableau), and I got a series of maps, one for each of the subjects listed in the TSH1 field:
Note that Tableau kindly tells you how many entries it was unable to map! (the ## unknown in the lower right).
Below I’ve Summed by the number of records (no genre, topical subject, etc.) for each state. For this, it’s better to use the graded color option than the stepped color option. If you have just five steps or stages of color, it looks like most of the states have the same number of images, when it is more varied. The graded color (used below) shows the variations better.
This map also shows that the location information for photographs from Mexico was not interpreted properly by Tableau. Sonora (for which there is data) is not highlighted.
Then I decided hey, why not a bubble map of locations, so here we go. Same data as above map, but I selected a different kind of visualization (called “Packed Bubbles” in Tableau).
When I hovered on some of the bubbles, I could easily see the messy data in Tableau. Ciudad Juarez is one of the cities/states that got mangled during import, probably due to the accent:
Finally, a simple map with circles corresponding to the number of photographs from that location. (Again clearly showing that the info from Mexico is not visible. In fact, 348 items seem not to be mapped.)
Obviously the next step would be to clean the data, using Google Refine, probably, and then reload.
Many many thanks to the Indiana University for making the Charles Cushman Photograph collection data available and so well-structured and detailed. Many thanks also to Miriam Posner for cleaning the data and providing tutorials for all of us to use!