Automaton in or out: run-time plan optimization for XML stream processing

  • Authors:
  • Hong Su;Elke A. Rundensteiner;Murali Mani

  • Affiliations:
  • Oracle Corporation, Redwood Shores, CA;Worcester Polytechnic Institute, Worcester, MA;Worcester Polytechnic Institute, Worcester, MA

  • Venue:
  • SSPS '08 Proceedings of the 2nd international workshop on Scalable stream processing system
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many systems such as Tukwila and YFilter combine automaton and algebra techniques to process queries over tokenized XML streams. Typically in this architecture, an automaton is first used to locate all query patterns in the input stream and compose the matched tokens into XML element nodes. These XML nodes are then passed to the tuple-based algebraic operators for further filtering or restructuring. This common processing style is however not always optimal. At times it is more efficient to retrieve only a subset of the patterns in the automaton while retrieving the rest of the patterns on the XML element nodes. In this paper, we use a cost-based solution to explore this novel optimization opportunity. We design three plan optimization algorithms, namely, MinExhaust, GreedyBasic and FastPrune. We also study how to migrate from a currently running plan to a new plan in a safe and efficient manner. Our experimentations have shown that the GreedyBasic or FastPrune algorithm can quickly find a plan that is close to optimal in most scenarios. Also we illustrate that the overhead in our approach for run-time statistics collection and plan migration are very lightweight.