Efficient evaluation of XQuery over streaming data

  • Authors:
  • Xiaogang Li;Gagan Agrawal

  • Affiliations:
  • Ohio State University, Columbus, OH;Ohio State University, Columbus, OH

  • Venue:
  • VLDB '05 Proceedings of the 31st international conference on Very large data bases
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the growing popularity of XML and emergence of streaming data model, processing queries over streaming XML has become an important topic. This paper presents a new framework and a set of techniques for processing XQuery over streaming data. As compared to the existing work on supporting XPath/XQuery over data streams, we make the following three contributions:1. We propose a series of optimizations which transform XQuery queries so that they can be correctly executed with a single pass on the dataset.2. We present a methodology for determining when an XQuery query, possibly after the transformations we introduce, can be correctly executed with only a single pass on the dataset.3. We describe a code generation approach which can handle XQuery queries with user-defined aggregates, including recursive functions. We aggressively use static analysis and generate executable code, i.e., do not require a query plan to be interpreted at runtime.We have evaluated our implementation using several XMark benchmarks and three other XQuery queries driven by real applications. Our experimental results show that as compared to Qizx/Open, Saxon, and Galax, our system: 1) is at least 25% faster on XMark queries with small datasets, 2) is significantly faster on XMark queries with larger datasets, 3) at least one order of magnitude faster on the queries driven by real applications, as unlike other systems, we can transform them to execute with a single pass, and 4) executes queries efficiently on large datasets when other systems often have memory overflows.