XML data partitioning strategies to improve parallelism in parallel holistic twig joins

  • Authors:
  • Imam Machdi;Toshiyuki Amagasa;Hiroyuki Kitagawa

  • Affiliations:
  • University of Tsukuba, Japan;University of Tsukuba, Japan;University of Tsukuba, Japan

  • Venue:
  • Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Parallel XML query processing systems that process numerous queries over large heterogeneous XML documents often experience under-performance due to workload imbalance and low CPU/system utilization, because conventional partitioning strategies cannot serve well for state-of-the-art query processing algorithms, such as holistic twig joins. Consequently, partitioning and distributing heterogeneous XML documents onto a parallel cluster system have lead to such an intricacy issue for maintaining good query performance. In this paper, we propose XML data partitioning strategies that are able to alleviate system performance degradation due to workload imbalance, especially for parallel holistic twig joins processing. The proposed XML data partitioning strategies aim at improving workload balance on both static data distribution and dynamic data distribution. In the first strategy we refine an XML partition having a high cost by series of XML data partition refinements with various levels of granularities from document, query, and subquery, up to node streams. The selection of the granularity level for refining a high cost partition is contextually dependent on the overall workload balance in the system. In the second strategy for dynamic data distribution, we dynamically handle low system utilization when there are many idle nodes in the system. We propose an XML data redistribution approach by partitioning XML data on the fly at the stream nodes-based granularity.