Statistics-based parallelization of XPath queries in shared memory systems

  • Authors:
  • Rajesh Bordawekar;Lipyeow Lim;Anastasios Kementsietsidis;Bryant Wei-Lun Kok

  • Affiliations:
  • IBM Watson Research Center;University of Hawaii at Manoa;IBM Watson Research Center;IBM Integrated Supply Chain Lab

  • Venue:
  • Proceedings of the 13th International Conference on Extending Database Technology
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The wide availability of commodity multi-core systems presents an opportunity to address the latency issues that have plaqued XML query processing. However, simply executing multiple XML queries over multiple cores merely addresses the throughput issue: intra-query parallelization is needed to exploit multiple processing cores for better latency. Toward this effort, this paper investigates the parallelization of individual XPath queries over shared-address space multi-core processors. Much previous work on parallelizing XPath in a distributed setting failed to exploit the shared memory parallelism of multi-core systems. We propose a novel, end-to-end parallelization framework that determines the optimal way of parallelizing an XML query. This decision is based on a statistics-based approach that relies both on the query specifics and the data statistics. At each stage of the parallelization process, we evaluate three alternative approaches, namely, data-, query-, and hybrid-partitioning. For a given XPath query, our parallelization algorithm uses XML statistics to estimate the relative efficiencies of these different alternatives and find an optimal parallel XPath processing plan. Our experiments using well-known XML documents validate our parallel cost model and optimization framework, and demonstrate that it is possible to accelerate XPath processing using commodity multi-core systems.