Modern Information Retrieval
A Bayesian decision model for cost optimal record matching
The VLDB Journal — The International Journal on Very Large Data Bases
Efficient Record Linkage in Large Data Sets
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
TAILOR: A Record Linkage Tool Box
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic data generation for deduplication and data linkage
IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
A two-step classification approach to unsupervised record linkage
AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Febrl: a freely available record linkage system with a graphical user interface
HDKM '08 Proceedings of the second Australasian workshop on Health data and knowledge management - Volume 80
Automatic record linkage using seeded nearest neighbour and support vector machine classification
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
ACM SIGKDD Explorations Newsletter
Evaluation of entity resolution approaches on real-world match problems
Proceedings of the VLDB Endowment
Computer Methods and Programs in Biomedicine
A taxonomy of privacy-preserving record linkage techniques
Information Systems
Hi-index | 0.00 |
The process of identifying record pairs that represent the same real-world entity in multiple databases, commonly known as record linkage, is one of the important steps in many data mining applications. In this paper, we address one of the sub-tasks in record linkage, i.e., the problem of assigning record pairs with an appropriate matching status. Techniques for solving this problem are referred to as decision models. Most existing decision models rely on good training data, which is, however, not commonly available in real-world applications. Decision models based on unsupervised machine learning techniques have recently been proposed. In this paper, we review several existing decision models and then propose an enhancement to cluster-based decision models. Experimental results show that our proposed decision model achieves the same accuracy of existing models while significantly reducing the number of record pairs required for manual review. The proposed model also provides a mechanism to trade off the accuracy with the number of record pairs required for clerical review.