Statistics-based parallelization of XPath queries in shared memory systems

Authors:
Rajesh Bordawekar;Lipyeow Lim;Anastasios Kementsietsidis;Bryant Wei-Lun Kok
Affiliations:
IBM Watson Research Center;University of Hawaii at Manoa;IBM Watson Research Center;IBM Integrated Supply Chain Lab
Venue:
Proceedings of the 13th International Conference on Extending Database Technology
Year:
2010

Citing 27
Cited 2

Intensive Data Management in Parallel Systems: A Survey

Distributed and Parallel Databases
The state of the art in distributed query processing

ACM Computing Surveys (CSUR)
Distributed query evaluation on semistructured data

ACM Transactions on Database Systems (TODS)
Query Processing in Parallel Relational Database Systems

Query Processing in Parallel Relational Database Systems
StatiX: making XML count

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Statistical synopses for graph-structured XML databases

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Estimating Answer Sizes for XML Queries

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Counting Twig Matches in a Tree

Proceedings of the 17th International Conference on Data Engineering
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications

Proceedings of the 27th International Conference on Very Large Data Bases
The XML benchmark project

The XML benchmark project
Statistical learning techniques for costing XML queries

VLDB '05 Proceedings of the 31st international conference on Very large data bases
CXHist: an on-line classification-based histogram for XML string selectivity estimation

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Cost-based optimization in DB2 XML

IBM Systems Journal
Using partial evaluation in distributed query evaluation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A relative cost model for XQuery

Proceedings of the 2007 ACM symposium on Applied computing
Distributed query evaluation with performance guarantees

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
A Static Load-Balancing Scheme for Parallel XML Parsing on Multicore CPUs

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Parallel XML processing by work stealing

Proceedings of the 2007 workshop on Service-oriented computing performance: aspects, issues, and approaches
XPathLearner: an on-line self-tuning Markov histogram for XML path selectivity estimation

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Structure and value synopses for XML data graphs

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Bloom histogram: path selectivity estimation for XML data with updates

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Grouping and optimization of XPath expressions in DB2® pureXML

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Dependable cardinality forecasts for XQuery

Proceedings of the VLDB Endowment
Proceedings of the 4th international workshop on Data management on new hardware

Data Management on New Hardware (co-located w/ SIGMOD/PODS 2008)
A Parallel Approach to XML Parsing

GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
Parallelization of XPath queries using multi-core processors: challenges and experiences

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
The Art of Multiprocessor Programming

The Art of Multiprocessor Programming

Case studies in hardware XPath acceleration

Proceedings of the 4th Annual International Conference on Systems and Storage
Scalable XML query processing using parallel pushdown transducers

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

The wide availability of commodity multi-core systems presents an opportunity to address the latency issues that have plaqued XML query processing. However, simply executing multiple XML queries over multiple cores merely addresses the throughput issue: intra-query parallelization is needed to exploit multiple processing cores for better latency. Toward this effort, this paper investigates the parallelization of individual XPath queries over shared-address space multi-core processors. Much previous work on parallelizing XPath in a distributed setting failed to exploit the shared memory parallelism of multi-core systems. We propose a novel, end-to-end parallelization framework that determines the optimal way of parallelizing an XML query. This decision is based on a statistics-based approach that relies both on the query specifics and the data statistics. At each stage of the parallelization process, we evaluate three alternative approaches, namely, data-, query-, and hybrid-partitioning. For a given XPath query, our parallelization algorithm uses XML statistics to estimate the relative efficiencies of these different alternatives and find an optimal parallel XPath processing plan. Our experiments using well-known XML documents validate our parallel cost model and optimization framework, and demonstrate that it is possible to accelerate XPath processing using commodity multi-core systems.