Logical foundations of artificial intelligence
Logical foundations of artificial intelligence
Readings in nonmonotonic reasoning
Readings in nonmonotonic reasoning
The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Artificial intelligence: a new synthesis
Artificial intelligence: a new synthesis
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning object identification rules for information integration
Information Systems - Data extraction, cleaning and reconciliation
Active Database Systems: Triggers and Rules for Advanced Database Processing
Active Database Systems: Triggers and Rules for Advanced Database Processing
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust Identification of Fuzzy Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A cost-based model and effective heuristic for repairing constraints by value modification
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Relational clustering for multi-type entity resolution
MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
Profile-Based Object Matching for Information Integration
IEEE Intelligent Systems
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Leveraging aggregate constraints for deduplication
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
D-Swoosh: A Family of Algorithms for Generic, Distributed Entity Resolution
ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
Semantic integrity in a relational data base system
VLDB '75 Proceedings of the 1st International Conference on Very Large Data Bases
Functional specifications of a subsystem for data base integrity
VLDB '75 Proceedings of the 1st International Conference on Very Large Data Bases
Swoosh: a generic approach to entity resolution
The VLDB Journal — The International Journal on Very Large Data Bases
Constraint-based entity matching
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
On the computational complexity of minimal-change integrity maintenance in relational databases
Inconsistency Tolerance
Entity resolution with evolving rules
Proceedings of the VLDB Endowment
Entity Resolution and Information Quality
Entity Resolution and Information Quality
Interaction between record matching and data repairing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Incorporating domain knowledge and user expertise in probabilistic Tuple merging
SUM'11 Proceedings of the 5th international conference on Scalable uncertainty management
NADEEF: a commodity data cleaning system
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Hybrid entity clustering using crowds and data
The VLDB Journal — The International Journal on Very Large Data Bases
Incremental entity resolution on rules and data
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.01 |
Entity resolution (ER) (also known as deduplication or merge-purge) is a process of identifying records that refer to the same real-world entity and merging them together. In practice, ER results may contain "inconsistencies," either due to mistakes by the match and merge function writers or changes in the application semantics. To remove the inconsistencies, we introduce "negative rules" that disallow inconsistencies in the ER solution (ER-N). A consistent solution is then derived based on the guidance from a domain expert. The inconsistencies can be resolved in several ways, leading to accurate solutions. We formalize ER-N, treating the match, merge, and negative rules as black boxes, which permits expressive and extensible ER-N solutions. We identify important properties for the rules that, if satisfied, enable less costly ER-N. We develop and evaluate two algorithms that find an ER-N solution based on guidance from the domain expert: the GNR algorithm that does not assume the properties and the ENR algorithm that exploits the properties.