Constraint-based entity matching

Authors:
Warren Shen;Xin Li;AnHai Doan
Affiliations:
University of Illinois, Urbana;University of Illinois, Urbana;University of Illinois, Urbana
Venue:
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Year:
2005

Citing 15
Cited 16

The merge/purge problem for large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Integration of heterogeneous databases without common domains using queries based on textual similarity

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic segmentation of text into structured records

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Learning to map between ontologies on the semantic web

Proceedings of the 11th international conference on World Wide Web
Multistrategy Learning for Information Extraction

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Learning domain-independent string transformation weights for high accuracy object identification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Iterative record linkage for cleaning and integration

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Mining reference tables for automatic text segmentation

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A hierarchical graphical model for record linkage

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
An integrated, conditional model of information extraction and coreference with application to citation matching

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Reference reconciliation in complex information spaces

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Profile-Based Object Matching for Information Integration

IEEE Intelligent Systems

Management of probabilistic data: foundations and challenges

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Extracting knowledge from XML document repository: a semantic Web-based approach

Information Technology and Management
Web based linkage

Proceedings of the 9th annual ACM international workshop on Web information and data management
Parallel linkage

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Query evaluation with soft-key constraints

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A Graph Partitioning Approach to Entity Disambiguation Using Uncertain Information

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
idMesh: graph-based disambiguation of linked data

Proceedings of the 18th international conference on World wide web
Extension of Schema Matching Platform ASMADE to Constraints and Mapping Expression

Advanced Internet Based Systems and Applications
Generic entity resolution with negative rules

The VLDB Journal — The International Journal on Very Large Data Bases
Reasoning about record matching rules

Proceedings of the VLDB Endowment
HARRA: fast iterative hashed record linkage for large-scale data collections

Proceedings of the 13th International Conference on Extending Database Technology
Querying uncertain data with aggregate constraints

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Dynamic constraints for record matching

The VLDB Journal — The International Journal on Very Large Data Bases
Meta similarity

Applied Intelligence
Entity resolution: theory, practice & open challenges

Proceedings of the VLDB Endowment
Escaping the Big Brother: An empirical study on factors influencing identification and information leakage on the Web

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Entity matching is the problem of deciding if two given mentions in the data, such as "Helen Hunt" and "H. M. Hunt", refer to the same real-world entity. Numerous solutions have been developed, but they have not considered in depth the problem of exploiting integrity constraints that frequently exist in the domains. Examples of such constraints include "a mention with age two cannot match a mention with salary 200K" and "if two paper citations match, then their authors are likely to match in the same order". In this paper we describe a probabilistic solution to entity matching that exploits such constraints to improve matching accuracy. At the heart of the solution is a generative model that takes into account the constraints during the generation process, and provides well-defined interpretations of the constraints. We describe a novel combination of EM and relaxation labeling algorithms that efficiently learns the model, thereby matching mentions in an unsupervised way, without the need for annotated training data. Experiments on several real-world domains show that our solution can exploit constraints to significantly improve matching accuracy, by 3-12% F-1, and that the solution scales up to large data sets.