Author: Gary Spikeberg
Once we had run all of the directories through OCR we had a text file of all of the entries from each directory. However, in order to actually put this data into the Time Traveler, we needed to do a little more work. We needed to find a way to break up each person’s entry into its component parts, and isolate just their address. Now we could have done this by hand, but we found that most people across all of the directories had 3 basic parts to their entry: Their name, their profession, and where they lived. With all of these components separated by a comma, this pattern made it very easy to write a program that would recognize those different parts of each entry and “parse” them into their component parts. This program had to be tweaked as we went, because the directories were slightly different each year, and this program also cleaned up all of those extra “specks” we had cluttering up our data. When all was said and done we had parsed over 86,000 entries for people living and working in the Keweenaw between 1888 and 1939!
The last step was to take all of those parsed entries and run them through a process called geocoding. Fellow Time Traveler Daniel Trepal has written an informative blog post about Geocoding. To give a brief explanation though, geocoding is the process of assigning a place on the map for everyone we could possibly find. We were able to match the addresses from the city directories to the address we recorded on our collection of historical maps, and directly link that person to that same building in the Time Traveler. This process wasn’t perfect, and of our roughly 86,000 directory entries we were able to map about 74,500 of them (or a little under 87%). However, when looking at similar projects these are actually really good results. It really speaks to the care those historical map makers had when creating the maps we use every day here at the Keweenaw Time Traveler.
Thank you for reading this brief look at how we created one of our longstanding datasets for the Time Traveler, I can’t wait for you all to see what’s coming next!
The new Explore App will eventually have access to thousands of Calumet & Hecla company records! In this week's Lunchtime Chat Zach Dill and James Juip discuss the challenges of digitizing these amazing records as well as sharing some of the cool things we found in the process!
A big thank you to our partners Council on Library and Information Resources (CLIR) and the Michigan Tech Archives for making this possible!
In this week's lunch time chat Dr. Dan Trepal sits down with James Juip to discuss how the Time Traveler Team uses the process of geocoding to populate the Time Traveler’s digital representation of the historical landscape with the people who lived and worked there.