On the memory requirements of XPath evaluation over XML streams

  • Authors:
  • Ziv Bar-Yossef;Marcus Fontoura;Vanja Josifovski

  • Affiliations:
  • Department of Electrical Engineering, Technion, Haifa 32000, Israel;Yahoo! Research Labs, 701 First Avenue, Sunnyvale, CA 94089, USA;Yahoo! Research Labs, 701 First Avenue, Sunnyvale, CA 94089, USA

  • Venue:
  • Journal of Computer and System Sciences
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The important challenge of evaluating XPath queries over XML streams has sparked much interest in the past few years. A number of algorithms have been proposed, supporting wider fragments of the query language, and exhibiting better performance and memory utilization. Nevertheless, all the algorithms known to date use a prohibitively large amount of memory for certain types of queries. A natural question then is whether this memory bottleneck is inherent or just an artifact of the proposed algorithms. In this paper we initiate the first systematic and theoretical study of lower bounds on the amount of memory required to evaluate XPath queries over XML streams. We present a general lower bound technique, which given a query, specifies the minimum amount of memory that any algorithm evaluating the query on a stream would need to incur. The lower bounds are stated in terms of new graph-theoretic properties of queries. The proofs are based on tools from communication complexity. We then exploit insights learned from the lower bounds to obtain a new algorithm for XPath evaluation on streams. The algorithm uses space close to the optimum. Our algorithm deviates from the standard paradigm of using automata or transducers, thereby avoiding the need to store large transition tables.