The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Record linkage: making maximum use of the discriminating power of identifying information
Communications of the ACM
Learning object identification rules for information integration
Information Systems - Data extraction, cleaning and reconciliation
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Scheduling Algorithms
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Joint deduplication of multiple record types in relational data
Proceedings of the 14th ACM international conference on Information and knowledge management
Compilers: Principles, Techniques, and Tools (2nd Edition)
Compilers: Principles, Techniques, and Tools (2nd Edition)
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Entity Resolution with Markov Logic
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Swoosh: a generic approach to entity resolution
The VLDB Journal — The International Journal on Very Large Data Bases
Large-Scale Deduplication with Constraints Using Dedupalog
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Entity resolution with iterative blocking
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Joint inference in information extraction
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Evaluation of entity resolution approaches on real-world match problems
Proceedings of the VLDB Endowment
Entity resolution with evolving rules
Proceedings of the VLDB Endowment
Large-scale collective entity matching
Proceedings of the VLDB Endowment
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection
Hi-index | 0.00 |
Entity resolution (ER) is the problem of identifying which records in a database represent the same entity. Often, records of different types are involved (e.g., authors, publications, institutions, venues), and resolving records of one type can impact the resolution of other types of records. In this paper we propose a flexible, modular resolution framework where existing ER algorithms developed for a given record type can be plugged in and used in concert with other ER algorithms. Our approach also makes it possible to run ER on subsets of similar records at a time, important when the full data are too large to resolve together. We study the scheduling and coordination of the individual ER algorithms, in order to resolve the full dataset, and show the scalability of our approach. We also introduce a "state-based" training technique where each ER algorithm is trained for the particular execution context (relative to other types of records) where it will be used.