Errors Detection and Correction in Large Scale Data Collecting

Authors:
Renato Bruni;Antonio Sassano
Affiliations:
-;-
Venue:
IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Year:
2001

Citing 5
Cited 8

Integer and combinatorial optimization

Integer and combinatorial optimization
Logical analysis of numerical data

Mathematical Programming: Series A and B - Special issue: papers from ismp97, the 16th international symposium on mathematical programming, Lausanne EPFL
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Finding Minimal Unsatisfiable Subformulae in Satisfiability Instances

CP '02 Proceedings of the 6th International Conference on Principles and Practice of Constraint Programming
Automated theorem proving: A logical basis (Fundamental studies in computer science)

Automated theorem proving: A logical basis (Fundamental studies in computer science)

Census Data Repair: a Challenging Application of Disjunctive Logic Programming

LPAR '01 Proceedings of the Artificial Intelligence on Logic for Programming
Approximating minimal unsatisfiable subformulae by means of adaptive core search

Discrete Applied Mathematics - The renesse issue on satisfiability
The DaQuinCIS architecture: a platform for exchanging and improving data quality in cooperative information systems

Information Systems - Special issue: Data quality in cooperative information systems
On Exact Selection of Minimally Unsatisfiable Subformulae

Annals of Mathematics and Artificial Intelligence
Discrete models for data imputation

Discrete Applied Mathematics - Discrete mathematics & data mining (DM & DM)
Improving data quality: consistency and accuracy

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Conditional functional dependencies for capturing data inconsistencies

ACM Transactions on Database Systems (TODS)
Discrete models for data imputation

Discrete Applied Mathematics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper is concerned with the problem of automatic detection and correction of inconsistent or out of range data in a general process of statistical data collecting. Under such circumstances, errors are usually detected by formulating a set of rules which the data records must respect in order to be declared correct. As a first relevant point, the set of rules itself is checked for inconsistency or redundancy, by encoding it into a propositional logic formula, and solving a sequence of Satisfiability problems. This set of rules is then used to detect erroneous data. In the subsequent phase of error correction, the above set of rules must be satisfied, but the erroneous records should be altered as little as possible, and frequency distributions of correct data should be preserved. As a second relevant point, error correction is modeled by encoding the rules with linear inequalities, and solving a sequence of set covering problems. The proposed procedure is tested on a real-world case of Census.