Linking historical datasets and making them available for the Web has increasingly become a subject of research in the field of digital humanities. In the Netherlands, history is intimately related to the maritime activity because it has been essential in the development of economic, social and cultural aspects of Dutch society. As such an important sector, it has been well documented by shipping companies, governments, newspapers and other institutions.
In this master project we assume that, given the importance of maritime activity in every day life in the XIX and XX centuries, announcements on the departures and arrivals of ships or mentions of accidents or other events, can be found in newspapers.
We have taken a two-stage approach: first, an heuristic-based method for record linkage and then machine-learning algorithms for article classification to be used for filtering in combination with domain features. Evaluation of the linking method has shown that certain domain features were indicative of mentions of ships in newspapers. Moreover, the classifier methods scored near perfect precision in predicting ship related articles.
Enriching historical ship records with links to newspaper archives is significant for the digital history community since it connects two datasets that would have otherwise required extensive annotating work and man hours to align. Our work is part of the Dutch Ships and Sailors Linked Data Cloud project. The full thesis can be found here [pdf].