Answering queries using views over probabilistic XML: complexity and tractability

Authors:
Bogdan Cautis;Evgeny Kharlamov
Affiliations:
Institut Mines-Télécom, Télécom ParisTech, CNRS LTCI, Paris, France;KRDB Research Centre, Free University of Bozen-Bolzano, Italy
Venue:
Proceedings of the VLDB Endowment
Year:
2012

Citing 29
Cited 0

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Tree pattern query minimization

The VLDB Journal — The International Journal on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Containment and equivalence for a fragment of XPath

Journal of the ACM (JACM)
Rewriting XPath queries using materialized views

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Query caching and view selection for XML databases

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Rewriting nested XML queries using nested views

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A Survey of Web Information Extraction Systems

IEEE Transactions on Knowledge and Data Engineering
On the complexity of managing probabilistic XML data

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The dichotomy of conjunctive queries on probabilistic structures

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
ProTDB: probabilistic data in XML

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient mining of XML query patterns for caching

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
MARS: a system for publishing XML from mixed and redundant storage

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for using materialized XPath views in XML query processing

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Structured materialized views for XML queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
XPath leashed

ACM Computing Surveys (CSUR)
On rewriting XPath queries using views

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Data integration with uncertainty

The VLDB Journal — The International Journal on Very Large Data Bases
Probabilistic databases: diamonds in the dirt

Communications of the ACM - Barbara Liskov: ACM's A.M. Turing Award Winner
Multiple Materialized View Selection for XPath Query Rewriting

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
On the expressiveness of probabilistic XML models

The VLDB Journal — The International Journal on Very Large Data Bases
Query evaluation over probabilistic XML

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient rewriting of XPath queries using Query Set Specifications

Proceedings of the VLDB Endowment
Aggregate queries for discrete and continuous probabilistic XML

Proceedings of the 13th International Conference on Database Theory
Querying XML data sources that export very large sets of views

ACM Transactions on Database Systems (TODS)
Queries and materialized views on probabilistic databases

Journal of Computer and System Sciences
Value joins are expensive over (probabilistic) XML

Proceedings of the 4th International Workshop on Logic in Databases
ProApproX: a lightweight approximation query processor over probabilistic trees

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient XQuery rewriting using multiple views

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the complexity of query answering using views in a probabilistic XML setting, identifying large classes of XPath queries -- with child and descendant navigation and predicates -- for which there are efficient (PTime) algorithms. We consider this problem under the two possible semantics for XML query results: with persistent node identifiers and in their absence. Accordingly, we consider rewritings that can exploit a single view, by means of compensation, and rewritings that can use multiple views, by means of intersection. Since in a probabilistic setting queries return answers with probabilities, the problem of rewriting goes beyond the classic one of retrieving XML answers from views. For both semantics of XML queries, we show that, even when XML answers can be retrieved from views, their probabilities may not be computable. For rewritings that use only compensation, we describe a PTime decision procedure, based on easily verifiable criteria that distinguish between the feasible cases -- when probabilistic XML results are computable -- and the unfeasible ones. For rewritings that can use multiple views, with compensation and intersection, we identify the most permissive conditions that make probabilistic rewriting feasible, and we describe an algorithm that is sound in general, and becomes complete under fairly permissive restrictions, running in PTime modulo worst-case exponential time equivalence tests. This is the best we can hope for since intersection makes query equivalence intractable already over deterministic data. Our algorithm runs in PTime whenever deterministic rewritings can be found in PTime.