The nature of statistical learning theory
The nature of statistical learning theory
Solving the multiple instance problem with axis-parallel rectangles
Artificial Intelligence
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Image Categorization by Learning and Reasoning with Regions
The Journal of Machine Learning Research
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
MILES: Multiple-Instance Learning via Embedded Instance Selection
IEEE Transactions on Pattern Analysis and Machine Intelligence
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Improving Grouped-Entity Resolution Using Quasi-Cliques
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
ACM SIGKDD Explorations Newsletter
Automatic training example selection for scalable unsupervised record linkage
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
SMILE: A Similarity-Based Approach for Multiple Instance Learning
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
MILIS: Multiple Instance Learning with Instance Selection
IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Cleaning and Linking of Historical Census Data Using Household Information
ICDMW '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops
A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication
IEEE Transactions on Knowledge and Data Engineering
Linkage of compound objects for supporting maintenance of large-scale web sites
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Hi-index | 0.00 |
Record linkage is the process of identifying records that refer to the same entities from different data sources. While most research efforts are concerned with linking individual records, new approaches have recently been proposed to link groups of records across databases. Group record linkage aims to determine if two groups of records in two databases refer to the same entity or not. One application where group record linkage is of high importance is the linking of census data that contain household information across time. In this paper we propose a novel method to group record linkage based on multiple instance learning. Our method treats group links as bags and individual record links as instances. We extend multiple instance learning from bag to instance classification to reconstruct bags from candidate instances. The classified bag and instance samples lead to a significant reduction in multiple group links, thereby improving the overall quality of linked data. We evaluate our method with both synthetic data and real historical census data.