XML data partitioning strategies to improve parallelism in parallel holistic twig joins

Authors:
Imam Machdi;Toshiyuki Amagasa;Hiroyuki Kitagawa
Affiliations:
University of Tsukuba, Japan;University of Tsukuba, Japan;University of Tsukuba, Japan
Venue:
Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
Year:
2009

Citing 18
Cited 1

Compile-time minimisation of load imbalance in loop nests

ICS '97 Proceedings of the 11th international conference on Supercomputing
On supporting containment queries in relational database management systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
XOO7: applying OO7 benchmark to XML query processing tool

Proceedings of the tenth international conference on Information and knowledge management
Holistic twig joins: optimal XML pattern matching

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Parallel Processing XML Documents

IDEAS '02 Proceedings of the 2002 International Symposium on Database Engineering & Applications
ViST: a dynamic index method for querying XML data by tree structures

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Structural Joins: A Primitive for Efficient XML Query Pattern Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
On the integration of structure indexes and inverted lists

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Efficient processing of XML twig patterns with parent child edges: a look-ahead approach

Proceedings of the thirteenth ACM international conference on Information and knowledge management
On boosting holism in XML twig pattern matching using structural indexing techniques

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
WIN: An E.cient Data Placement Strategy for Parallel XML Databases

ICPADS '05 Proceedings of the 11th International Conference on Parallel and Distributed Systems - Volume 01
Processing XPath Queries in PC-Clusters Using XML Data Partitioning

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Efficient Query Processing for Large XML Data in Distributed Environments

AINA '07 Proceedings of the 21st International Conference on Advanced Networking and Applications
XMark: a benchmark for XML data management

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Querying XML Data using PC Cluster System

DEXA '07 Proceedings of the 18th International Conference on Database and Expert Systems Applications
Efficiently Querying Large XML Data Repositories: A Survey

IEEE Transactions on Knowledge and Data Engineering
GMX: an XML data partitioning scheme for holistic twig joins

Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services

Executing parallel TwigStack algorithm on a multi-core system

Proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel XML query processing systems that process numerous queries over large heterogeneous XML documents often experience under-performance due to workload imbalance and low CPU/system utilization, because conventional partitioning strategies cannot serve well for state-of-the-art query processing algorithms, such as holistic twig joins. Consequently, partitioning and distributing heterogeneous XML documents onto a parallel cluster system have lead to such an intricacy issue for maintaining good query performance. In this paper, we propose XML data partitioning strategies that are able to alleviate system performance degradation due to workload imbalance, especially for parallel holistic twig joins processing. The proposed XML data partitioning strategies aim at improving workload balance on both static data distribution and dynamic data distribution. In the first strategy we refine an XML partition having a high cost by series of XML data partition refinements with various levels of granularities from document, query, and subquery, up to node streams. The selection of the granularity level for refining a high cost partition is contextually dependent on the overall workload balance in the system. In the second strategy for dynamic data distribution, we dynamically handle low system utilization when there are many idle nodes in the system. We propose an XML data redistribution approach by partitioning XML data on the fly at the stream nodes-based granularity.