Reconciling Inconsistent Data in Probabilistic XML Data Integration

Authors:
Tadeusz Pankowski
Affiliations:
Institute of Control and Information Engineering, Poznań University of Technology, Poland and Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Poznań, Poland
Venue:
BNCOD '08 Proceedings of the 25th British national conference on Databases: Sharing Data, Information and Knowledge
Year:
2008

Citing 13
Cited 3

Consistent query answers in inconsistent databases

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Data integration: a theoretical perspective

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Reasoning about keys for XML

Information Systems
Composing schema mappings: second-order dependencies to the rescue

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
XML data exchange: consistency and query answering

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A cost-based model and effective heuristic for repairing constraints by value modification

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
ConQuer: efficient management of inconsistent databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Reconciling while tolerating disagreement in collaborative data sharing

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Normalization theory for XML

ACM SIGMOD Record
Composing mappings among data sources

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Data integration with uncertainty

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
XML data integration in SixP2P: a theoretical framework

DaMaP '08 Proceedings of the 2008 international workshop on Data management in peer-to-peer systems
Preference-driven querying of inconsistent relational databases

EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology

XML data integration in SixP2P: a theoretical framework

DaMaP '08 Proceedings of the 2008 international workshop on Data management in peer-to-peer systems
Query Propagation in a P2P Data Integration System in the Presence of Schema Constraints

Globe '08 Proceedings of the 1st international conference on Data Management in Grid and Peer-to-Peer Systems
A Survey on Uncertainty Management in Data Integration

Journal of Data and Information Quality (JDIQ)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of dealing with inconsistent data while integrating XML data from different sources is an important task, necessary to improve data integration quality. Typically, in order to remove inconsistencies, i.e. conflicts between data, data cleaning (or repairing) procedures are applied. In this paper, we present a probabilistic XML data integration setting. A probability is assigned to each data source and its probability models the reliability levelof the data source. In this way, an answer (a tuple of values of XML trees) has a probability assigned to it. The problem is how to compute such probability, especially when the same answer is produced by many sources. We consider three semantics for computing such probabilistic answers: by-peer, by-sequence, and by-subtreesemantics. The probabilistic answers can be used for resolving a class of inconsistencies violating XML functional dependencies defined over the target schema. Having a probability distribution over a set of conflicting answers, we can choose the one for which the probability of being correct is the highest.