Handbook of record linkage: methods for health and statistical studies, administration, and business
Handbook of record linkage: methods for health and statistical studies, administration, and business
The double metaphone search algorithm
C/C++ Users Journal
TAILOR: A Record Linkage Tool Box
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Probability Estimates for Multi-class Classification by Pairwise Coupling
The Journal of Machine Learning Research
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Automatic record linkage using seeded nearest neighbour and support vector machine classification
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Interactive Entity Resolution in Relational Data: A Visual Analytic Tool and Its Evaluation
IEEE Transactions on Visualization and Computer Graphics
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Machine learning for science and society
Machine Learning
Hi-index | 0.00 |
Linking multiple databases to create longitudinal data is an important research problem with multiple applications. Longitudinal data allows analysts to perform studies that would be unfeasible otherwise. We have linked historical census databases to create longitudinal data that allow tracking people over time. These longitudinal data have already been used by social scientists and historians to investigate historical trends and to address questions about society, history and economy, and this comparative, systematic research would not be possible without the linked data. The goal of the linking is to identify the same person in multiple census collections. Data imprecision in historical census data and the lack of unique personal identifiers make this task a challenging one. In this paper we design and employ a record linkage system that incorporates a supervised learning module for classifying pairs of records as matches and non-matches. We show that our system performs large scale linkage producing high quality links and generating sufficient longitudinal data to allow meaningful social science studies. We demonstrate the impact of the longitudinal data through a study of the economic changes in 19th century Canada.