Efficient algorithms for evaluating xpath over streams

  • Authors:
  • Gang Gou;Rada Chirkova

  • Affiliations:
  • North Carolina State University, Raleigh, NC;North Carolina State University, Raleigh, NC

  • Venue:
  • Proceedings of the 2007 ACM SIGMOD international conference on Management of data
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we address the problem of evaluating XPath queries over streaming XML data. We consider a practical XPath fragment called Univariate XPath, which includes the commonly used '/' and '//' axes and allows *-node tests and arbitrarily nested predicates. It is well known that this XPath fragment can be efficiently evaluated in O(|D||Q|) time in the non-streaming environment, where |D| is the document size and |Q| is the query size. However, this is not necessarily true in the streaming environment, since streaming algorithms have to satisfy stricter requirement than non-streaming algorithms, in that all data must be read sequentially in one pass. Therefore, it is not surprising that state-of-the-art stream-querying algorithms have higher time complexity than O(|D||Q|). In this paper we revisit the XPath stream-querying problem, and show that Univariate XPath can be efficiently evaluated in O|D||Q|) time in the streaming environment. Specifically, we propose two O(|D||Q|)-time stream-querying algorithms, LQ and EQ, which are based on the lazy strategy and on the eager strategy, respectively. To the best of our knowledge, LQ and EQ are the first XPath stream-querying algorithms that achieve O(|D||Q|) time performance. Further, our algorithms achieve O(|D||Q|) time performance without trading off space performance. Instead, they have better buffering-space performance than state-of-the-art stream-querying algorithms. In particular, EQ achieves optimal buffering-space performance. Our experimental results show that our algorithms have not only good theoretical complexity but also considerable practical performance advantages over existing algorithms.