Optimizing XML querying using type-based document projection

Authors:
Véronique Benzaken;Giuseppe Castagna;Dario Colazzo;Kim Nguy˜ˆen
Affiliations:
LRI, Université Paris-Sud, CNRS, Orsay F-91405, France;CNRS, Université Paris Diderot, Sorbonne Paris Cité, Paris, France;LRI, Université Paris-Sud, CNRS, Orsay F-91405, France;LRI, Université Paris-Sud, CNRS, Orsay F-91405, France
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2013

Citing 25
Cited 0

An evaluation model for clustering strategies in the O2 object-oriented database system

ICDT '90 Proceedings of the third international conference on database theory on Database theory
Clustering strategies in O2: an overview

Building an object-oriented database system
Validating streaming XML documents

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
XPath: Looking Forward

EDBT '02 Proceedings of the Worshops XMLDM, MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised Papers
Processing XML streams with deterministic automata and stream indexes

ACM Transactions on Database Systems (TODS)
A gentle introduction to semantic subtyping

PPDP '05 Proceedings of the 7th ACM SIGPLAN international conference on Principles and practice of declarative programming
Accelerating queries by pruning XML documents

Data & Knowledge Engineering
Taxonomy of XML schema languages using formal language theory

ACM Transactions on Internet Technology (TOIT)
MonetDB/XQuery: a fast XQuery processor powered by a relational engine

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Inference of concise DTDs from XML data

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Type-based XML projection

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A system for the static analysis of XPath

ACM Transactions on Information Systems (TOIS)
On the complexity of nonrecursive XQuery and functional query languages on complex values

ACM Transactions on Database Systems (TODS)
XMark: a benchmark for XML data management

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Projecting XML documents

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient processing of expressive node-selecting queries on XML data in secondary storage: a tree automata-based approach

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Staircase join: teach a relational DBMS to watch its (axis) steps

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
XPath satisfiability in the presence of DTDs

Journal of the ACM (JACM)
Learning deterministic regular expressions for the inference of schemas from XML data

Proceedings of the 17th international conference on World Wide Web
Semantic subtyping: Dealing set-theoretically with function, union, intersection, and negation types

Journal of the ACM (JACM)
XPath leashed

ACM Computing Surveys (CSUR)
Patterns and types for querying XML documents

DBPL'05 Proceedings of the 10th international conference on Database Programming Languages
Which XML schemas admit 1-pass preorder typing?

ICDT'05 Proceedings of the 10th international conference on Database Theory
XPathMark: an XPath benchmark for the XMark generated data

XSym'05 Proceedings of the Third international conference on Database and XML Technologies
A full pattern-based paradigm for XML query processing

PADL'05 Proceedings of the 7th international conference on Practical Aspects of Declarative Languages

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML data projection (or pruning) is a natural optimization for main memory query engines: given a query Q over a document D, the subtrees of D that are not necessary to evaluate Q are pruned, thus producing a smaller document D'; the query Q is then executed on D', hence avoiding to allocate and process nodes that will never be reached by Q. In this article, we propose a new approach, based on types, that greatly improves current solutions. Besides providing comparable or greater precision and far lesser pruning overhead, our solution—unlike current approaches—takes into account backward axes, predicates, and can be applied to multiple queries rather than just to single ones. A side contribution is a new type system for XPath able to handle backward axes. The soundness of our approach is formally proved. Furthermore, we prove that the approach is also complete (i.e., yields the best possible type-driven pruning) for a relevant class of queries and Schemas. We further validate our approach using the XMark and XPathMark benchmarks and show that pruning not only improves the main memory query engine's performances (as expected) but also those of state of the art native XML databases.