A Heterogeneous Field Matching Method for Record Linkage

Authors:
Steven N. Minton;Claude Nanjo;Craig A. Knoblock;Martin Michalowski;Matthew Michelson
Affiliations:
Fetch Technologies;Fetch Technologies;University of Southern California;University of Southern California;University of Southern California
Venue:
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Year:
2005

Citing 8
Cited 11

The merge/purge problem for large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Making large-scale support vector machine learning practical

Advances in kernel methods
Interactive deduplication using active learning

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning domain-independent string transformation weights for high accuracy object identification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to match and cluster large high-dimensional data sets for data integration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A hierarchical graphical model for record linkage

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Object identification in a Bayesian context

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2

Generalized Mongue-Elkan Method for Approximate Text String Comparison

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Learning blocking schemes for record linkage

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
On-the-Fly Integration and Ad Hoc Querying of Life Sciences Databases Using LifeDB

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Frameworks for entity matching: A comparison

Data & Knowledge Engineering
Learning string transformations from examples

Proceedings of the VLDB Endowment
On indexing error-tolerant set containment

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
K-radius subgraph comparison for RDF data cleansing

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Learning phenotype mapping for integrating large genetic data

BioNLP '11 Proceedings of BioNLP 2011 Workshop
Learning top-k transformation rules

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Efficient Privacy Preserving Protocols for Similarity Join

Transactions on Data Privacy
Learning to adapt cross language information extraction wrapper

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Record linkage is the process of determining that two records refer to the same entity. A key subprocess is evaluating how well the individual fields, or attributes, of the records match each other. One approach to matching fields is to use hand-written domain-specific rules. This "expert systems" approach may result in good performance for specific applications, but it is not scalable. This paper describes a new machine learning approach that creates expert-like rules for field matching. In our approach, the relationship between two field values is described by a set of heterogeneous transformations. Previous machine learning methods used simple models to evaluate the distance between two fields. However, our approach enables more sophisticated relationships to be modeled, which better capture the complex domain specific, common-sense phenomena that humans use to judge similarity. We compare our approach to methods that rely on simpler homogeneous models in several domains. By modeling more complex relationships we produce more accurate results.