Automaton in or out: run-time plan optimization for XML stream processing

Authors:
Hong Su;Elke A. Rundensteiner;Murali Mani
Affiliations:
Oracle Corporation, Redwood Shores, CA;Worcester Polytechnic Institute, Worcester, MA;Worcester Polytechnic Institute, Worcester, MA
Venue:
SSPS '08 Proceedings of the 2nd international workshop on Scalable stream processing system
Year:
2008

Citing 20
Cited 2

Predicate migration: optimizing queries with expensive predicates

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Eddies: continuously adaptive query processing

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Processing XML Streams with Deterministic Automata

ICDT '03 Proceedings of the 9th International Conference on Database Theory
Query Optimization for XML

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
An XML query engine for network-bound data

The VLDB Journal — The International Journal on Very Large Data Bases
Stream processing of XPath queries with predicates

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
XPath queries on streaming data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Raindrop: a uniform and layered algebraic framework for XQueries on XML streams

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Adapting to source properties in processing data integration queries

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
The BEA streaming XQuery processor

The VLDB Journal — The International Journal on Very Large Data Bases
Semantic query optimization for XQuery over XML streams

VLDB '05 Proceedings of the 31st international conference on Very large data bases
An Efficient XPath Query Processor for XML Streams

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Automaton meets algebra: a hybrid paradigm for XML stream processing

Data & Knowledge Engineering - Special issue: ER 2003
A transducer-based XML query processor

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
XMark: a benchmark for XML data management

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Mixed mode XML query processing

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Query processing for high-volume XML message brokering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Holistic twig joins on indexed XML documents

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
FluXQuery: an optimizing XQuery processor for streaming XML data

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Utility-driven load shedding for xml stream processing

Proceedings of the 17th international conference on World Wide Web
Efficient event stream processing: handling ambiguous events and patterns with negation

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many systems such as Tukwila and YFilter combine automaton and algebra techniques to process queries over tokenized XML streams. Typically in this architecture, an automaton is first used to locate all query patterns in the input stream and compose the matched tokens into XML element nodes. These XML nodes are then passed to the tuple-based algebraic operators for further filtering or restructuring. This common processing style is however not always optimal. At times it is more efficient to retrieve only a subset of the patterns in the automaton while retrieving the rest of the patterns on the XML element nodes. In this paper, we use a cost-based solution to explore this novel optimization opportunity. We design three plan optimization algorithms, namely, MinExhaust, GreedyBasic and FastPrune. We also study how to migrate from a currently running plan to a new plan in a safe and efficient manner. Our experimentations have shown that the GreedyBasic or FastPrune algorithm can quickly find a plan that is close to optimal in most scenarios. Also we illustrate that the overhead in our approach for run-time statistics collection and plan migration are very lightweight.