Incomplete Information in Relational Databases
Journal of the ACM (JACM)
Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Greed is good: approximating independent sets in sparse and bounded-degree graphs
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Chasing constrained tuple-generating dependencies
PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Theoretical Computer Science - Special issue: principles and practice of constraint programming
The impact of poor data quality on the typical enterprise
Communications of the ACM
Consistent query answers in inconsistent databases
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
DISTANCE-SAT: complexity and algorithms
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Constraint-generating dependencies
Journal of Computer and System Sciences
AJAX: an extensible data cleaning tool
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Arktos: towards the modeling, design, control and execution of ETL processes
Information Systems - Data extraction, cleaning and reconciliation
Problem of Incomplete Information in Relational Databases
Problem of Incomplete Information in Relational Databases
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
Condensed Representation of Database Repairs for Consistent Query Answering
ICDT '03 Proceedings of the 9th International Conference on Database Theory
Declarative Data Cleaning: Language, Model, and Algorithms
Proceedings of the 27th International Conference on Very Large Data Bases
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Conditional Dependencies for Horizontal Decompositions
Proceedings of the 10th Colloquium on Automata, Languages and Programming
Errors Detection and Correction in Large Scale Data Collecting
IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Methods for evaluating and creating data quality
Information Systems - Special issue: Data quality in cooperative information systems
A cost-based model and effective heuristic for repairing constraints by value modification
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Minimal-change integrity maintenance using tuple deletions
Information and Computation
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
ICDT'07 Proceedings of the 11th international conference on Database Theory
Conditional functional dependencies for capturing data inconsistencies
ACM Transactions on Database Systems (TODS)
Dependencies revisited for improving data quality
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On generating near-optimal tableaux for conditional functional dependencies
Proceedings of the VLDB Endowment
Discovering data quality rules
Proceedings of the VLDB Endowment
Semandaq: a data quality system based on conditional functional dependencies
Proceedings of the VLDB Endowment
A revival of integrity constraints for data cleaning
Proceedings of the VLDB Endowment
On approximating optimum repairs for functional dependency violations
Proceedings of the 12th International Conference on Database Theory
Data Quality Aware Queries in Collaborative Information Systems
APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Incorporating cardinality constraints and synonym rules into conditional functional dependencies
Information Processing Letters
Estimating the confidence of conditional functional dependencies
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Conditional Dependencies: A Principled Approach to Improving Data Quality
BNCOD 26 Proceedings of the 26th British National Conference on Databases: Dataspace: The Final Frontier
Analyses and Validation of Conditional Dependencies with Built-in Predicates
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
ERACER: a database approach for statistical inference and data cleaning
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Consistent query answers in inconsistent probabilistic databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
GDR: a system for guided data repair
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Development of foundation models for Internet of Things
Frontiers of Computer Science in China
Exploiting conflict structures in inconsistent databases
ADBIS'10 Proceedings of the 14th east European conference on Advances in databases and information systems
Towards certain fixes with editing rules and master data
Proceedings of the VLDB Endowment
Handling dirty databases: from user warning to data cleaning -- towards an interactive approach
SUM'10 Proceedings of the 4th international conference on Scalable uncertainty management
Proceedings of the VLDB Endowment
Context-aware replacement operations for data cleaning
Proceedings of the 2011 ACM Symposium on Applied Computing
Interaction between record matching and data repairing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Improving XML data quality with functional dependencies
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Differential dependencies: Reasoning and discovery
ACM Transactions on Database Systems (TODS)
Support for user involvement in data cleaning
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Extending functional dependency to detect abnormal data in RDF graphs
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Cost-efficient repair in inconsistent probabilistic databases
Proceedings of the 20th ACM international conference on Information and knowledge management
Towards certain fixes with editing rules and master data
The VLDB Journal — The International Journal on Very Large Data Bases
Detecting suspect answers in the presence of inconsistent information
FoIKS'12 Proceedings of the 7th international conference on Foundations of Information and Knowledge Systems
Repairing XML functional dependency violations
Information Sciences: an International Journal
Probabilistic query answering over inconsistent databases
Annals of Mathematics and Artificial Intelligence
Repairing inconsistent dimensions in data warehouses
Data & Knowledge Engineering
The data analytics group at the qatar computing research institute
ACM SIGMOD Record
Don't be SCAREd: use SCalable Automatic REpairing with maximal likelihood and bounded changes
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
NADEEF: a commodity data cleaning system
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Extended dimensions for cleaning and querying inconsistent data warehouses
Proceedings of the sixteenth international workshop on Data warehousing and OLAP
A data cleaning framework based on user feedback
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Editorial: Efficient discovery of similarity constraints for matching dependencies
Data & Knowledge Engineering
The LLUNATIC data-cleaning framework
Proceedings of the VLDB Endowment
Extending inclusion dependencies with conditions
Theoretical Computer Science
Sampling from repairs of conditional functional dependency violations
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Two central criteria for data quality are consistency and accuracy. Inconsistencies and errors in a database often emerge as violations of integrity constraints. Given a dirty database D, one needs automated methods to make it consistent, i.e., find a repair D' that satisfies the constraints and "minimally" differs from D. Equally important is to ensure that the automatically-generated repair D' is accurate, or makes sense, i.e., D' differs from the "correct" data within a predefined bound. This paper studies effective methods for improving both data consistency and accuracy. We employ a class of conditional functional dependencies (CFDs) proposed in [6] to specify the consistency of the data, which are able to capture inconsistencies and errors beyond what their traditional counterparts can catch. To improve the consistency of the data, we propose two algorithms: one for automatically computing a repair D' that satisfies a given set of CFDs, and the other for incrementally finding a repair in response to updates to a clean database. We show that both problems are intractable. Although our algorithms are necessarily heuristic, we experimentally verify that the methods are effective and efficient. Moreover, we develop a statistical method that guarantees that the repairs found by the algorithms are accurate above a predefined rate without incurring excessive user interaction.