Generic entity resolution with negative rules

Authors:
Steven Euijong Whang;Omar Benjelloun;Hector Garcia-Molina
Affiliations:
Computer Science Department, Stanford University, Stanford, USA 94305;Google Inc., Mountain View, USA 94043;Computer Science Department, Stanford University, Stanford, USA 94305
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2009

Citing 21
Cited 7

Logical foundations of artificial intelligence

Logical foundations of artificial intelligence
Readings in nonmonotonic reasoning

Readings in nonmonotonic reasoning
The merge/purge problem for large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Artificial intelligence: a new synthesis

Artificial intelligence: a new synthesis
Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning object identification rules for information integration

Information Systems - Data extraction, cleaning and reconciliation
Active Database Systems: Triggers and Rules for Advanced Database Processing

Active Database Systems: Triggers and Rules for Advanced Database Processing
Interactive deduplication using active learning

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust Identification of Fuzzy Duplicates

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Reference reconciliation in complex information spaces

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A cost-based model and effective heuristic for repairing constraints by value modification

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Relational clustering for multi-type entity resolution

MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
Profile-Based Object Matching for Information Integration

IEEE Intelligent Systems
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Leveraging aggregate constraints for deduplication

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
D-Swoosh: A Family of Algorithms for Generic, Distributed Entity Resolution

ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
Semantic integrity in a relational data base system

VLDB '75 Proceedings of the 1st International Conference on Very Large Data Bases
Functional specifications of a subsystem for data base integrity

VLDB '75 Proceedings of the 1st International Conference on Very Large Data Bases
Swoosh: a generic approach to entity resolution

The VLDB Journal — The International Journal on Very Large Data Bases
Constraint-based entity matching

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
On the computational complexity of minimal-change integrity maintenance in relational databases

Inconsistency Tolerance

Entity resolution with evolving rules

Proceedings of the VLDB Endowment
Entity Resolution and Information Quality

Entity Resolution and Information Quality
Interaction between record matching and data repairing

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Incorporating domain knowledge and user expertise in probabilistic Tuple merging

SUM'11 Proceedings of the 5th international conference on Scalable uncertainty management
NADEEF: a commodity data cleaning system

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Hybrid entity clustering using crowds and data

The VLDB Journal — The International Journal on Very Large Data Bases
Incremental entity resolution on rules and data

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.01

Visualization

Abstract

Entity resolution (ER) (also known as deduplication or merge-purge) is a process of identifying records that refer to the same real-world entity and merging them together. In practice, ER results may contain "inconsistencies," either due to mistakes by the match and merge function writers or changes in the application semantics. To remove the inconsistencies, we introduce "negative rules" that disallow inconsistencies in the ER solution (ER-N). A consistent solution is then derived based on the guidance from a domain expert. The inconsistencies can be resolved in several ways, leading to accurate solutions. We formalize ER-N, treating the match, merge, and negative rules as black boxes, which permits expressive and extensible ER-N solutions. We identify important properties for the rules that, if satisfied, enable less costly ER-N. We develop and evaluate two algorithms that find an ER-N solution based on guidance from the domain expert: the GNR algorithm that does not assume the properties and the ENR algorithm that exploits the properties.