GMX: an XML data partitioning scheme for holistic twig joins

  • Authors:
  • Imam Machdi;Toshiyuki Amagasa;Hiroyuki Kitagawa

  • Affiliations:
  • University of Tsukuba, Japan;University of Tsukuba, Japan;University of Tsukuba, Japan

  • Venue:
  • Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

As traditional partitioning strategies do not serve well for semistructured data, partitioning and distributing heterogeneous XML documents onto a parallel cluster system have lead to such an intricacy issue for maintaining good query processing performance. In this paper, we propose a grid metadata model for XML that gives a conceptual view to partition XML data, specifically for holistic twig joins processing. The proposed model adopts a cost-based model and facilitates a set of partition refinement methods for workload balancing purpose. The model has features of reducing the workload variance significantly on the cluster system, duplicating XML data necessarily to avoid data dependency among cluster nodes, and exploiting inter query parallelism and intra query parallelism. We evaluate the effectiveness of our proposed model in the experiment that our data partitioning method has better workload balance and has an impact on better parallel speed up performance as well.