SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: concepts and techniques
Data mining: concepts and techniques
Machine Learning
Computation of Normalized Edit Distance and Applications
IEEE Transactions on Pattern Analysis and Machine Intelligence
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning domain-independent string transformation weights for high accuracy object identification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
TAILOR: A Record Linkage Tool Box
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards parameter-free data mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A hierarchical graphical model for record linkage
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Robust Identification of Fuzzy Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
A Comparison of Personal Name Matching: Techniques and Practical Issues
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Probabilistic data generation for deduplication and data linkage
IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
IEEE Transactions on Information Theory
A two-step classification approach to unsupervised record linkage
AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Febrl: a freely available record linkage system with a graphical user interface
HDKM '08 Proceedings of the second Australasian workshop on Health data and knowledge management - Volume 80
Automatic record linkage using seeded nearest neighbour and support vector machine classification
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
The Normalized Compression Distance as a Distance Measure in Entity Identification
ICDM '09 Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects
ACM SIGKDD Explorations Newsletter
Hi-index | 0.00 |
The field of Record Linkage is concerned with identifying records from one or more datasets which refer to the same underlying entities. Where entity-unique identifiers are not available and errors occur, the process is non-trivial. Many techniques developed in this field require human intervention to set parameters, manually classify possibly matched records, or provide examples of matched and non-matched records. Whilst of great use and providing high quality results, the requirement of human input, besides being costly, means that if the parameters or examples are not produced or maintained properly, linkage quality will be compromised. The contributions of this paper are a critical discussion on the record linkage process, arguing for a more restrictive use of blocking in research, and evaluating and modifying the farthest-first clustering technique to produce results close to a supervised technique.