Decision models for record linkage

Authors:
Lifang Gu;Rohan Baxter
Affiliations:
CSIRO ICT Centre, Canberra, ACT, Australia;Australian Taxation Office, Canberra, ACT, Australia
Venue:
Data Mining
Year:
2006

Citing 6
Cited 7

Modern Information Retrieval

Modern Information Retrieval
A Bayesian decision model for cost optimal record matching

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient Record Linkage in Large Data Sets

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
TAILOR: A Record Linkage Tool Box

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic data generation for deduplication and data linkage

IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning

A two-step classification approach to unsupervised record linkage

AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Febrl: a freely available record linkage system with a graphical user interface

HDKM '08 Proceedings of the second Australasian workshop on Health data and knowledge management - Volume 80
Automatic record linkage using seeded nearest neighbour and support vector machine classification

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Development and user experiences of an open source data cleaning, deduplication and record linkage system

ACM SIGKDD Explorations Newsletter
Evaluation of entity resolution approaches on real-world match problems

Proceedings of the VLDB Endowment
Bagging, bumping, multiview, and active learning for record linkage with empirical results on patient identity data

Computer Methods and Programs in Biomedicine
A taxonomy of privacy-preserving record linkage techniques

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The process of identifying record pairs that represent the same real-world entity in multiple databases, commonly known as record linkage, is one of the important steps in many data mining applications. In this paper, we address one of the sub-tasks in record linkage, i.e., the problem of assigning record pairs with an appropriate matching status. Techniques for solving this problem are referred to as decision models. Most existing decision models rely on good training data, which is, however, not commonly available in real-world applications. Decision models based on unsupervised machine learning techniques have recently been proposed. In this paper, we review several existing decision models and then propose an enhancement to cluster-based decision models. Experimental results show that our proposed decision model achieves the same accuracy of existing models while significantly reducing the number of record pairs required for manual review. The proposed model also provides a mechanism to trade off the accuracy with the number of record pairs required for clerical review.