Grouping and optimization of XPath expressions in DB2® pureXML

  • Authors:
  • Andrey Balmin;Fatma Özcan;Ashutosh Singh;Edison Ting

  • Affiliations:
  • IBM Almaden Research Center, San Jose, CA, USA;IBM Almaden Research Center, San Jose, CA, USA;IBM Almaden Research Center, San Jose, CA, USA;IBM Sillicon Valley Lab, San Jose, CA, USA

  • Venue:
  • Proceedings of the 2008 ACM SIGMOD international conference on Management of data
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Several XML DBMSs support XQuery and/or SQL/XML languages, which are based on navigational primitives in the form of XPath expressions. Typically, these systems either model each XPath step as a separate query plan operator, or employ holistic approaches that can evaluate multiple steps of a single XPath expression. There have also been proposals to execute as many XPath expressions as possible within a single FLWOR block simultaneously in a data streaming context. We observe that blindly combining all possible XPath expressions for concurrent execution can result in significant performance degradation in a database system. We identify two main problems with this strategy. First, the simple strategy of grouping all XPath expressions on a single document does not always work if the query involves more than one data source or has nested query blocks. Second, merging XPath expressions may result in unnecessary execution of branches that can be filtered by predicates in other branches or elsewhere in the query. To rectify these problems, IBM® DB2® pureXML" adopts a combination of heuristic-based rewrite transformations, to decide which XPath expressions should be grouped for concurrent evaluation, and cost-based optimization to globally order the groups within the query execution plan, and locally order the branches within individual groups. Experimental evaluation confirms that selectively grouping multiple XPath expressions allows for better query evaluation performance and reduces the query optimization complexity. These optimization techniques have been implemented as part of IBM DB2 9.5 (pureXML).