The Space Complexity of Processing XML Twig Queries Over Indexed Documents

Authors:
Mirit Shalem;Ziv Bar-Yossef
Affiliations:
Department of Computer Science, Technion, Haifa, Israel. mirit2s@cs.technion.ac.il;Department of Electrical Engineering, Technion, Haifa, Israel/ Google Haifa Engineering Center, Haifa, Israel. zivby@ee.technion.ac.il
Venue:
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Year:
2008

Citing 0
Cited 7

Principles of Holism for sequential twig pattern matching

The VLDB Journal — The International Journal on Very Large Data Bases
Machine models for query processing

ACM SIGMOD Record
Towards unifying advances in twig join algorithms

ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
Benchmarking holistic approaches to XML tree pattern query processing

DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Indexing and querying XML using extended Dewey labeling scheme

Data & Knowledge Engineering
Fast optimal twig joins

Proceedings of the VLDB Endowment
Adding logical operators to tree pattern queries on graph-structured data

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current twig join algorithms incur high memory costs on queries that involve child-axis nodes. In this paper we provide an analytical explanation for this phenomenon. In a first large-scale study of the space complexity of evaluating XPath queries over indexed XML documents we show the space to depend on three factors: (1) whether the query is a path or a tree; (2) the types of axes occurring in the query and their occurrence pattern; and (3) the mode of query evaluation (filtering, full-fledged, or "pattern matching"). Our lower bounds imply that evaluation of a large class of queries that have child-axis nodes indeed requires large space. Our study also reveals that on some queries there is a large gap between the space needed for pattern matching and the space needed for full-fledged evaluation or filtering. This implies that many existing twig join algorithms, which work in the pattern matching mode, incur significant space overhead. We present a new twig join algorithm that avoids this overhead. On certain queries our algorithm is exceedingly more space-efficient than existing algorithms, sometimes bringing the space down from linear in the document size to constant.