Efficiently loading and processing XML streams

Authors:
Ming Li;Murali Mani;Elke A. Rundensteiner
Affiliations:
Worcester Polytechnic Institute, Massachusetts;Worcester Polytechnic Institute, Massachusetts;Worcester Polytechnic Institute, Massachusetts
Venue:
IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
Year:
2008

Citing 13
Cited 1

Indexing and Querying XML Data for Regular Path Expressions

Proceedings of the 27th International Conference on Very Large Data Bases
An XML query engine for network-bound data

The VLDB Journal — The International Journal on Very Large Data Bases
Stream processing of XPath queries with predicates

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
XPath queries on streaming data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Rainbow: multi-XQuery optimization using materialized XML views

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Raindrop: a uniform and layered algebraic framework for XQueries on XML streams

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Automaton meets algebra: a hybrid paradigm for XML stream processing

Data & Knowledge Engineering - Special issue: ER 2003
A transducer-based XML query processor

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Projecting XML documents

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Query processing for high-volume XML message brokering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Semantic query optimization in an automata-algebra combined XQuery engine over XML streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Semantic query optimization for processing XML streams with minimized memory footprint

DataX '08 Proceedings of the 2008 EDBT workshop on Database technologies for handling XML information on the web
Efficient Processing of XML Update Streams

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

A survey on XML streaming evaluation techniques

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.01

Visualization

Abstract

XML stream applications bring the novel challenge of efficiently processing queries on sequentially accessible token-based input streams. Our Raindrop project is the first to accommodate token-based stream processing using an algebraic framework where both tokens and tuples are modeled in a uniform manner. In this paper, we illustrate how the stream loading model of our system on the fly conducts XML navigation over the input stream via concurrently constructing a minimized light-weight XML tree representation, which is called navigation-free data instance. These captured XML fragments are minimized in terms of buffer consumption. Based on the compact representation of the navigation-free data instances, we propose techniques for subsequent algebraic query evaluation, in particular, effective strategies for supporting multi-mode query operators and alternative data output semantics. The proposed stream loading model requires a much smaller buffer footprint, compared to alternative solutions in the literature such as Y-Filter. And the proposed algebra-based evaluation techniques offer effective ways to handle data recursion over XML streams, i.e., avoiding overhead from the structural join operators. Our stream loading and query evaluation techniques have been implemented as part of the Raindrop system. Experimental results based on the Raindrop system are also reported in this paper.