Fast Approximate Energy Minimization via Graph Cuts
IEEE Transactions on Pattern Analysis and Machine Intelligence
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
ACM SIGKDD Explorations Newsletter
Adaptive graphical approach to entity resolution
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Unsupervised deduplication using cross-field dependencies
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order Terms
ILP '08 Proceedings of the 18th international conference on Inductive Logic Programming
A Graph Partitioning Approach to Entity Disambiguation Using Uncertain Information
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Exploiting context analysis for combining multiple entity resolution systems
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Online collective entity resolution
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Frameworks for entity matching: A comparison
Data & Knowledge Engineering
A constrained clustering approach to duplicate detection among relational data
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Scaling record linkage to non-uniform distributed class sizes
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Detecting duplicate biological entities using Shortest Path Edit Distance
International Journal of Data Mining and Bioinformatics
EIF: a framework of effective entity identification
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Modeling relations and their mentions without labeled text
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Evaluation of entity resolution approaches on real-world match problems
Proceedings of the VLDB Endowment
Identity matching using personal and social identity features
Information Systems Frontiers
ReDD-Observatory: Using the Web of Data for Evaluating the Research-Disease Disparity
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Location-based reasoning about complex multi-agent behavior
Journal of Artificial Intelligence Research
Joint entity resolution on multiple datasets
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Record deduplication is the task of merging database records that refer to the same underlying entity. In relational data-bases, accurate deduplication for records of one type is often dependent on the decisions made for records of other types. Whereas nearly all previous approaches have merged records of different types independently, this work models these inter-dependencies explicitly to collectively deduplicate records of multiple types. We construct a conditional random field model of deduplication that captures these relational dependencies, and then employ a novel relational partitioning algorithm to jointly deduplicate records. For two citation matching datasets, we show that collectively deduplicating paper and venue records results in up to a 30% error reduction in venue deduplication, and up to a 20% error reduction in paper deduplication.