A Versatile Record Linkage Method by Term Matching Model Using CRF

Authors:
Quang Minh Vu;Atsuhiro Takasu;Jun Adachi
Affiliations:
National Insitute of Informatics, Tokyo, Japan 101-8430;National Insitute of Informatics, Tokyo, Japan 101-8430;National Insitute of Informatics, Tokyo, Japan 101-8430
Venue:
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Year:
2009

Citing 11
Cited 0

Representations of quasi-Newton matrices and their use in limited memory methods

Mathematical Programming: Series A and B
Foundations of statistical natural language processing

Foundations of statistical natural language processing
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Modern Information Retrieval

Modern Information Retrieval
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem

Data Mining and Knowledge Discovery
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Communities on the Web Using a Max-Flow and a Site-Oriented Framework

IEICE - Transactions on Information and Systems
Discriminative word alignment with conditional random fields

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Improving the performance of personal name disambiguation using web directories

Information Processing and Management: an International Journal
'socio sense' and 'cyber infrastructure for information explosion era': projects in Japan

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We solve the problem of record linkage between databases where record fields are mixed and permuted in different ways. The solution method uses a conditional random fields model to find matching terms in record pairs and uses matching terms in the duplicate detection process. Although records with permuted fields may have partly reordered terms, our method can still utilize local orders of terms for finding matching terms. We carried out experiments on several well-known data sets in record linkage research, and our method showed its advantages on most of the data sets. We also did experiments on a synthetic data set, in which records combined fields in random order, and verified that it could handle even this data set.