Incremental Detection of Inconsistencies in Distributed Data

Authors:
Wenfei Fan;Jianzhong Li;Nan Tang;Wenyuan Yu
Affiliations:
-;-;-;-
Venue:
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Year:
2012

Citing 0
Cited 4

A sound and complete chase procedure for constrained tuple-generating dependencies

Journal of Intelligent Information Systems
The data analytics group at the qatar computing research institute

ACM SIGMOD Record
Extending inclusion dependencies with conditions

Theoretical Computer Science
Quality of information-based source assessment and selection

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the problem of incremental detection of errors in distributed data. Given a distributed database D, a set \Sigma of conditional functional dependencies (CFDs), the set V of violations of the CFDs in D, and updates \Delta D to D, it is to find, with minimum data shipment, changes \Delta V to V in response to \Delta D. The need for the study is evident since real-life data is often dirty, distributed and is frequently updated. It is often prohibitively expensive to recompute the entire set of violations when D is updated. We show that the incremental detection problem is NP-complete for D partitioned either vertically or horizontally, even when \Sigma and D are fixed. Nevertheless, we show that it is bounded and better still, actually optimal: there exist algorithms to detect errors such that their computational cost and data shipment are both linear in the size of \Delta D and \Delta V, independent of the size of the database D. We provide such incremental algorithms for vertically partitioned data, and show that the algorithms are optimal. We further propose optimization techniques for the incremental algorithm over vertical partitions to reduce data shipment. We verify experimentally, using real-life data on Amazon Elastic Compute Cloud (EC2), that our algorithms substantially outperform their batch counterparts even when \Delta V is reasonably large.