ROX: run-time optimization of XQueries

  • Authors:
  • Riham Abdel Kader;Peter Boncz;Stefan Manegold;Maurice van Keulen

  • Affiliations:
  • University of Twente, Enschede, Netherlands;CWI, Amsterdam, Netherlands;CWI, Amsterdam, Netherlands;University of Twente, Enschede, Netherlands

  • Venue:
  • Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Optimization of complex XQueries combining many XPath steps and joins is currently hindered by the absence of good cardinality estimation and cost models for XQuery. Additionally, the state-of-the-art of even relational query optimization still struggles to cope with cost model estimation errors that increase with plan size, as well as with the effect of correlated joins and selections. In this research, we propose to radically depart from the traditional path of separating the query compilation and query execution phases, by having the optimizer execute, materialize partial results, and use sampling based estimation techniques to observe the characteristics of intermediates. The proposed technique takes as input a Join Graph where the edges are either equi-joins or XPath steps, and the execution environment provides value- and structural-join algorithms, as well as structural and value-based indices. While run-time optimization with sampling removes many of the vulnerabilities of classical optimizers, it brings its own challenges with respect to keeping resource usage under control, both with respect to the materialization of intermediates, as well as the cost of plan exploration using sampling. Our approach deals with these issues by limiting the run-time search space to so-called "zero-investment algorithms for which sampling can be guaranteed to be strictly linear in sample size. All operators and XML value indices used by ROX for sampling have the zero-investment property. We perform extensive experimental evaluation on large XML datasets that shows that our run-time query optimizer finds good query plans in a robust fashion and has limited run-time overhead.