The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
A practical clustering algorithm for static and dynamic information organization
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Duplicate record elimination in large data files
ACM Transactions on Database Systems (TODS)
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to algorithms
Constraint-based clustering in large databases
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Condensed Representation of Database Repairs for Consistent Query Answering
ICDT '03 Proceedings of the 9th International Conference on Database Theory
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A Logical Framework for Querying and Repairing Inconsistent Databases
IEEE Transactions on Knowledge and Data Engineering
Machine Learning
Integrating constraints and metric learning in semi-supervised clustering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Robust Identification of Fuzzy Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A cost-based model and effective heuristic for repairing constraints by value modification
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Record linkage: similarity measures and algorithms
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Minimal-change integrity maintenance using tuple deletions
Information and Computation
Approximations of weighted independent set and hereditary subset problems
COCOON'99 Proceedings of the 5th annual international conference on Computing and combinatorics
Measuring constraint-set utility for partitional clustering algorithms
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Efficient approximate entity extraction with edit distance constraints
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Generic Entity Resolution in Relational Databases
ADBIS '09 Proceedings of the 13th East European Conference on Advances in Databases and Information Systems
The VLDB Journal — The International Journal on Very Large Data Bases
Creating probabilistic databases from duplicated data
The VLDB Journal — The International Journal on Very Large Data Bases
Generic entity resolution with negative rules
The VLDB Journal — The International Journal on Very Large Data Bases
Efficient approximate search on string collections
Proceedings of the VLDB Endowment
Reasoning about record matching rules
Proceedings of the VLDB Endowment
Modeling and querying possible repairs in duplicate detection
Proceedings of the VLDB Endowment
Record linkage with uniqueness constraints and erroneous values
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
Integer linear programming models for constrained clustering
DS'10 Proceedings of the 13th international conference on Discovery science
Foundations and Trends in Databases
Querying uncertain data with aggregate constraints
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Differential dependencies: Reasoning and discovery
ACM Transactions on Database Systems (TODS)
Dynamic constraints for record matching
The VLDB Journal — The International Journal on Very Large Data Bases
Quality-aware similarity assessment for entity matching in Web data
Information Systems
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Flexible and efficient distributed resolution of large entities
FoIKS'12 Proceedings of the 7th international conference on Foundations of Information and Knowledge Systems
Adaptive Connection Strength Models for Relationship-Based Entity Resolution
Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution
Hi-index | 0.00 |
We show that aggregate constraints (as opposed to pairwise constraints) that often arise when integrating multiple sources of data, can be leveraged to enhance the quality of deduplication. However, despite its appeal, we show that the problem is challenging, both semantically and computationally. We define a restricted search space for deduplication that is intuitive in our context and we solve the problem optimally for the restricted space. Our experiments on real data show that incorporating aggregate constraints significantly enhances the accuracy of deduplication.