The importance of sibling clustering for efficient bulkload of XML document trees

Authors:
C. C. Kanne;G. Moerkotte
Affiliations:
-;-
Venue:
IBM Systems Journal
Year:
2006

Citing 8
Cited 2

On the performance of object clustering techniques

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
A clustering algorithm for hierarchical structures

ACM Transactions on Database Systems (TODS)
ToXgene: a template-based data generator for XML

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Anatomy of a native XML base management system

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient Storage of XML Data

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
System RX: one part relational, one part XML

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Cost-sensitive reordering of navigational primitives

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
XMark: a benchmark for XML data management

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

A linear time algorithm for optimal tree sibling partitioning and approximation algorithms in Natix

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Partial Evaluation for Distributed XPath Query Processing and Beyond

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In an XML Data Store (XDS), importing documents from external sources is a very frequent operation. Because a document import consists of a large number of individual node inserts, it is essentially a small bulkload operation, and thus efficient bulkload support is crucial for the performance of the XDS. The bulkload operation is in essence a mapping of an XML parser's output into the storage structures of the XDS. This involves two major subtasks: (1) partitioning the document's logical tree structure into subtrees that can be stored on a page in a way that is both space-efficient and suitable for later processing and (2) mapping the subtrees to the internal representation of the XDS for paging. In enterprise-scale environments with very large documents and many parallel bulkload operations, the first task is particularly challenging, as not only disk space consumption, but also CPU and main-memory usage are important factors. In this paper, we discuss the requirements for an XDS bulkload component and examine existing algorithms for tree partitioning and their applicability to the bulkload operation. We derive a new tree-partitioning algorithm for use in the bulkload operation and present the design of the bulkload component for the XDS Natix. Finally, we evaluate the performance of the bulkload component and compare our results with previous work.