Author: Dan Trepal
In a previous blog post, we discussed the US decennial census, which is a key part of the upcoming version of our Keweenaw Time Traveler web interface. This huge, detailed record set gives us an intimate view of the lives of almost 380,000 historical residents of the Copper Country. But how did we transform this huge set of census records into an interactive digital atlas? A crucial step in that transformation is the process of geocoding. In today’s blog post I will give a quick overview of the process of geocoding and how we used geographic information systems (GIS) software - plus a lot of elbow grease – to populate the Time Traveler’s digital representation of the historical landscape with the people who lived and worked there.
What is Geocoding?
As I explained in my blog post on the census, census records can now be obtained in a digital database format that is ‘machine readable’, or in other words easily stored and manipulated on a computer. Geocoding is the process of taking each record in the census (or any piece of digital information) and assigning real-world spatial coordinates to it – a spot on the earth. This spot could be any kind of ‘place’ - a country, a state, a county, a town, a street, a home address – even a tree or other specific landmark. Anything with a known location can serve as the ‘target’ for geocoding and have digital information attached to it.
Geocoding historical data like the census can be a challenging task, for several reasons. One important problem we face is common to all historical research: historical records are usually incomplete and contain uncertainties and inaccuracies. Beginning in 1880 the census was collected by an army of enumerators – temporary employees from each community who were given basic instructions before going door-to-door filling out the census form for hundreds of households, which were later gathered together to form the census record. Overall, these enumerators did a very good job, but in many cases addresses are vague or missing, names are misspelled (how often would you spell another person’s last name correctly after hearing it just once?), or the handwriting is illegible, among other things. On top of all that, the digital version of the census we are working with was transcribed from microfilm photos of those original copies by volunteers with no local knowledge – how easy is it to read a digital scan of a microfilm copy of a smudged cursive handwritten rendering of ‘Amygdaloid Street’? A local to the Keweenaw might recognize this geological term, because it is part of our mining heritage, but others might never guess it!
Cleaning Up Historical Data
The first step in geocoding the census, then, is to ‘clean up’ the digital data, focusing on correcting errors or gaps in place names – street numbers, street names, or references to mining locations or even buildings in some cases. We do this manually, going through each record and correcting the address information to a standard set of terms. This is long, hard work. But we have learned, though extensive experience, that computers still struggle to interpret this sort of information automatically. The path between a person living in Houghton in 1880 – for example - and our current record of their existence relies on a sort of 140- year game of telephone, with information being recorded, stored, copied, and converted into different forms and for different purposes over time. During this process information can be lost, altered, or simply garbled.
Once we have cleaned up the data so that everyone’s address information is as legible as we can make it, we can move on to the actual process of geocoding. Geocoding requires two pieces of information – the location of the record you want to map, as described in the record itself, and the place in the real world where you want to map the record to. In the Keweenaw Time traveler, that second piece of information exists as a series of points on our digital map of the historical landscape.
Building a Historical Digital Landscape for the Census Records
Here is where we run into another challenge – more gaps in the historical record. The only way we can put a point down on our digital map to represent a historical home is to have a historical record of that place’s location - usually a historical map that shows where that home was and lists the address of the home. Luckily, we have historical Sanborn fire insurance maps that provide both of those pieces of information. But even these present us with a problem. They only cover part of our landscape, and only for certain years. So how do we map people if we can’t find their house on a historical map? Alternately, what if their census record is missing key address information, like the house number or street name?
To deal with this challenge, we have created four levels or scales of digital historical geography that serve as destinations for our census records on our digital historical map of the Keweenaw: Buildings, Streets, Settlements, and Enumeration Districts. Building these digital places is the next major step in the geocoding process.
Buildings: Where our historical maps show the locations and addresses of buildings, we can create a digital copy of them in our GIS software that contains the building’s real-world coordinates. This is the ‘gold standard’ for our geocoding work, the most accurate scale we can capture when the historical address data in the census is complete and the address falls within our Sanborn map coverage area.
Streets: We also created a map of the centroid of historical streets (the middle point along the street’s length) in the KeTT. Records with incomplete addresses, but with an identifiable street name, can be mapped to this point. This represents an approximate location where we are reasonably certain which street a person lived on, but unsure exactly where on the street they lived.
Settlements: This scale of geographies represents ‘places’ in the Keweenaw – this could be a village, or a mining location, or a fishing camp, or any other small place, usually in the rural parts of the Keweenaw, where street addresses are vague, don’t exists, or only existed for a short time. We can also create a Settlement point for larger towns, so that if the census record is clear that a person lived in, say, Calumet, but their home and street address information is missing or illegible, we can still map them to the town they lived in.
Enumeration Districts: The Census Bureau has created special districts for collecting the census, called Enumeration Districts. This is a way to divide up the landscape into specific ‘districts.’ Each census enumerator is responsible for collecting records on all the people within their district. Every census record is labeled with the district it was collected within. This means that 100% of the census is mappable to enumeration district. Even if the home address, street name, and any other location information is missing or garbled, we can still map people to the enumeration district – in our case to the centroid of the district boundary (the middle point, as with the streets). But in order to do that we had to reconstruct the district boundaries from old maps and descriptions – these records were often sketchy or incomplete, making their reconstruction a difficult process. But the Enumeration Districts serve as an important ‘catch-all’ geography, so the effort is very much worth it.
Time to (Finally) Map it!
Now that we have cleaned up our census records, and created a digital historical map of the geographical destinations for all our people, the hardest work is done (whew!). Next comes the key step of actually doing the geocoding. We start by building address locators – these are a kind of digital geographic key that tells the computer where to put records based on their cleaned up location information. We convert our four scales of geographical destinations into a digital table that the computer can compare to the addresses in our census records. When the computer found a match between a census record and one of our destinations, it maps that record to that place. Voila! - this process of comparing and matching is the actual geocoding step.
The product of that process is a new digital file in the GIS software where each person is represented as a point in space, with all their census information attached in a database. Depending on how compete their address information was, each person is mapped once, to the most detailed geography we can match them to – Buildings if possible, then Streets, then Settlements, and finally Enumeration Districts if there is not enough information to map them to any of the previous three scales.
So how did it go? You’ll soon find out! When the newest version of the Keweenaw Time Traveler launches this summer, you will be able to search and explore this new database of mapped historical Copper Country residents using our upgraded, map-based interface. As you can see, a lot of hard work and historical sleuthing happens behind the scenes in order to turn an enormous, but valuable, historical record into something that is fun and easy to explore. It also gives us a chance to see historical people within a visual picture of their historical landscape. Seeing people mapped this way allows us to start seeing things like streetscapes, neighborhoods, and the pathways of daily life. In doing this, we are re-using the census for a purpose that its original enumerators and tabulators never dreamed of - detailed peek back into the past that can now be preserved and explored by future generations.