Consistent query answers in inconsistent databases
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficiency of a Good But Not Linear Set Union Algorithm
Journal of the ACM (JACM)
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to algorithms
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
Condensed Representation of Database Repairs for Consistent Query Answering
ICDT '03 Proceedings of the 9th International Conference on Database Theory
Computing consistent query answers using conflict hypergraphs
Proceedings of the thirteenth ACM international conference on Information and knowledge management
A cost-based model and effective heuristic for repairing constraints by value modification
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Database repairing using updates
ACM Transactions on Database Systems (TODS)
ULDBs: databases with uncertainty and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
First-order query rewriting for inconsistent databases
Journal of Computer and System Sciences
Improving data quality: consistency and accuracy
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Conditional functional dependencies for capturing data inconsistencies
ACM Transactions on Database Systems (TODS)
MCDB: a monte carlo approach to managing uncertain data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Approximate Probabilistic Query Answering over Inconsistent Databases
ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
Repair checking in inconsistent databases: algorithms and complexity
Proceedings of the 12th International Conference on Database Theory
On approximating optimum repairs for functional dependency violations
Proceedings of the 12th International Conference on Database Theory
Minimal-change integrity maintenance using tuple deletions
Information and Computation
Towards certain fixes with editing rules and master data
Proceedings of the VLDB Endowment
Sampling the repairs of functional dependency violations under hard constraints
Proceedings of the VLDB Endowment
ICDT'07 Proceedings of the 11th international conference on Database Theory
Towards certain fixes with editing rules and master data
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Violations of functional dependencies (FDs) and conditional functional dependencies (CFDs) are common in practice, often indicating deviations from the intended data semantics. These violations arise in many contexts such as data integration and Web data extraction. Resolving these violations is challenging for a variety of reasons, one of them being the exponential number of possible repairs. Most of the previous work has tackled this problem by producing a single repair that is nearly optimal with respect to some metric. In this paper, we propose a novel data cleaning approach that is not limited to finding a single repair, namely sampling from the space of possible repairs. We give several motivating scenarios where sampling from the space of CFD repairs is desirable, we propose a new class of useful repairs, and we present an algorithm that randomly samples from this space in an efficient way. We also show how to restrict the space of repairs based on constraints that reflect the accuracy of different parts of the database. We experimentally evaluate our algorithms against previous approaches to show the utility and efficiency of our approach.