Grouping and optimization of XPath expressions in DB2® pureXML

Authors:
Andrey Balmin;Fatma Özcan;Ashutosh Singh;Edison Ting
Affiliations:
IBM Almaden Research Center, San Jose, CA, USA;IBM Almaden Research Center, San Jose, CA, USA;IBM Almaden Research Center, San Jose, CA, USA;IBM Sillicon Valley Lab, San Jose, CA, USA
Venue:
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Year:
2008

Citing 25
Cited 2

Introduction to algorithms

Introduction to algorithms
Extensible/rule based query rewrite optimization in Starburst

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Lore: a database management system for semistructured data

ACM SIGMOD Record
Storing and querying ordered XML using a relational database system

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Multi-level operator combination in XML query processing

Proceedings of the eleventh international conference on Information and knowledge management
A general technique for querying XML documents using a relational database system

ACM SIGMOD Record
TIMBER: A native XML database

The VLDB Journal — The International Journal on Very Large Data Bases
Anatomy of a native XML base management system

The VLDB Journal — The International Journal on Very Large Data Bases
XBench Benchmark and Performance Testing of XML DBMSs

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
The BEA streaming XQuery processor

The VLDB Journal — The International Journal on Very Large Data Bases
Full-Fledged Algebraic XPath Processing in Natix

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Querying XML streams

The VLDB Journal — The International Journal on Very Large Data Bases
System RX: one part relational, one part XML

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Native Xquery processing in oracle XMLDB

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
XML and relational database management systems: inside Microsoft® SQL Server™ 2005

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Native XML support in DB2 universal database

VLDB '05 Proceedings of the 31st international conference on Very large data bases
A Complete and Efficient Algebraic Compiler for XQuery

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
MonetDB/XQuery: a fast XQuery processor powered by a relational engine

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
DB2 goes hybrid: integratng native XML and XQuery with relational data and SQL

IBM Systems Journal
Cost-based optimization in DB2 XML

IBM Systems Journal
Mixed mode XML query processing

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
From tree patterns to generalized tree patterns: on efficient evaluation of XQuery

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Query processing for high-volume XML message brokering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Query rewrite for XML in Oracle XML DB

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Statistics-based parallelization of XPath queries in shared memory systems

Proceedings of the 13th International Conference on Extending Database Technology
Effective pruning for XML structural match queries

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several XML DBMSs support XQuery and/or SQL/XML languages, which are based on navigational primitives in the form of XPath expressions. Typically, these systems either model each XPath step as a separate query plan operator, or employ holistic approaches that can evaluate multiple steps of a single XPath expression. There have also been proposals to execute as many XPath expressions as possible within a single FLWOR block simultaneously in a data streaming context. We observe that blindly combining all possible XPath expressions for concurrent execution can result in significant performance degradation in a database system. We identify two main problems with this strategy. First, the simple strategy of grouping all XPath expressions on a single document does not always work if the query involves more than one data source or has nested query blocks. Second, merging XPath expressions may result in unnecessary execution of branches that can be filtered by predicates in other branches or elsewhere in the query. To rectify these problems, IBM® DB2® pureXML" adopts a combination of heuristic-based rewrite transformations, to decide which XPath expressions should be grouped for concurrent evaluation, and cost-based optimization to globally order the groups within the query execution plan, and locally order the branches within individual groups. Experimental evaluation confirms that selectively grouping multiple XPath expressions allows for better query evaluation performance and reduces the query optimization complexity. These optimization techniques have been implemented as part of IBM DB2 9.5 (pureXML).