Containment of nested XML queries

  • Authors:
  • Xin Dong;Alon Y. Halevy;Igor Tatarinov

  • Affiliations:
  • University of Washington, Seattle;University of Washington, Seattle;University of Washington, Seattle

  • Venue:
  • VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Query containment is the most fundamental relationship between a pair of database queries: a query Q is said to be contained in a query Q′ if the answer for Q is always a subset of the answer for Q′, independent of the current state of the database. Query containment is an important problem in a wide variety of data management applications, including verification of integrity constraints, reasoning about contents of data sources in data integration, semantic caching, verification of knowledge bases, determining queries independent of updates, and most recently, in query reformulation for peer data management systems. Query containment has been studied extensively in the relational context and for XPath queries, but not for XML queries with nesting. We consider the theoretical aspects of the problem of query containment for XML queries with nesting. We begin by considering conjunctive XML queries (c-XQueries), and show that containment is in polynomial time if we restrict the fanout (number of sibling sub-blocks) to be 1. We prove that for arbitrary fanout, containment is coNP-hard already for queries with nesting depth 2, even if the query does not include variables in the return clauses. We then show that for queries with fixed nesting depth, containment is coNP-complete. Next, we establish the computational complexity of query containment for several practical extensions of c-XQueries, including queries with union and arithmetic comparisons, and queries where the XPath expressions may include descendant edges and negation. Finally, we describe a few heuristics for speeding up query containment checking in practice by exploiting properties of the queries and the underlying schema.