Statistical learning techniques for costing XML queries

Authors:
Ning Zhang;Peter J. Haas;Vanja Josifovski;Guy M. Lohman;Chun Zhang
Affiliations:
University of Waterloo, Waterloo, ON, Canada;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA
Venue:
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Year:
2005

Citing 16
Cited 15

Building regression cost models for multidatabase systems

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
StatiX: making XML count

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications

Proceedings of the 27th International Conference on Very Large Data Bases
LEO - DB2's LEarning Optimizer

Proceedings of the 27th International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The XML benchmark project

The XML benchmark project
A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
XBench Benchmark and Performance Testing of XML DBMSs

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Evolutionary techniques for updating query cost models in a dynamic multidatabase environment

The VLDB Journal — The International Journal on Very Large Data Bases
Approximate XML query answers

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Querying XML streams

The VLDB Journal — The International Journal on Very Large Data Bases
Mixed mode XML query processing

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Bloom histogram: path selectivity estimation for XML data with updates

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Automated statistics collection in DB2 UDB

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Cost-based optimization in DB2 XML

IBM Systems Journal
Managing operational business intelligence workloads

ACM SIGOPS Operating Systems Review
Using Structural Joins and Holistic Twig Joins for Native XML Query Optimization

ADBIS '09 Proceedings of the 13th East European Conference on Advances in Databases and Information Systems
Statistics-based parallelization of XPath queries in shared memory systems

Proceedings of the 13th International Conference on Extending Database Technology
Efficient physical operators for cost-based XPath execution

Proceedings of the 13th International Conference on Extending Database Technology
Towards a comprehensive assessment for selectivity estimation approaches of XML queries

International Journal of Web Engineering and Technology
An integrative approach to query optimization in native XML database management systems

Proceedings of the Fourteenth International Database Engineering & Applications Symposium
Scaling XML query processing: distribution, localization and pruning

Distributed and Parallel Databases
Aggregation strategies for columnar in-memory databases in a mixed workload

Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
Managing dynamic mixed workloads for operational business intelligence

DNIS'10 Proceedings of the 6th international conference on Databases in Networked Information Systems
Robust estimation of resource consumption for SQL queries using statistical techniques

Proceedings of the VLDB Endowment
Efficient fragmentation of large XML documents

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Automatic selection of processing units for coprocessing in databases

ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Efficient co-processor utilization in database query processing

Information Systems
Active and accelerated learning of cost models for optimizing scientific applications

VLDB '06 Proceedings of the 32nd international conference on Very large data bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Developing cost models for query optimization is significantly harder for XML queries than for traditional relational queries. The reason is that XML query operators are much more complex than relational operators such as table scans and joins. In this paper, we propose a new approach, called COMET, to modeling the cost of XML operators; to our knowledge, COMET is the first method ever proposed for addressing the XML query costing problem. As in relational cost estimation, COMET exploits a set of system catalog statistics that summarizes the XML data; the set of "simple path" statistics that we propose is new, and is well suited to the XML setting. Unlike the traditional approach, COMET uses a new statistical learning technique called "transform regression" instead of detailed analytical models to predict the overall cost. Besides rendering the cost estimation problem tractable for XML queries, COMET has the further advantage of enabling the query optimizer to be self-tuning, automatically adapting to changes over time in the query workload and in the system environment. We demonstrate COMET's feasibility by developing a cost model for the recently proposed XNAV navigational operator. Empirical studies with synthetic, benchmark, and real-world data sets show that COMET can quickly obtain accurate cost estimates for a variety of XML queries and data sets.