On the performance of object clustering techniques
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
A clustering algorithm for hierarchical structures
ACM Transactions on Database Systems (TODS)
ToXgene: a template-based data generator for XML
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Anatomy of a native XML base management system
The VLDB Journal — The International Journal on Very Large Data Bases
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
System RX: one part relational, one part XML
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Cost-sensitive reordering of navigational primitives
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
XMark: a benchmark for XML data management
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A linear time algorithm for optimal tree sibling partitioning and approximation algorithms in Natix
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Partial Evaluation for Distributed XPath Query Processing and Beyond
ACM Transactions on Database Systems (TODS)
Hi-index | 0.00 |
In an XML Data Store (XDS), importing documents from external sources is a very frequent operation. Because a document import consists of a large number of individual node inserts, it is essentially a small bulkload operation, and thus efficient bulkload support is crucial for the performance of the XDS. The bulkload operation is in essence a mapping of an XML parser's output into the storage structures of the XDS. This involves two major subtasks: (1) partitioning the document's logical tree structure into subtrees that can be stored on a page in a way that is both space-efficient and suitable for later processing and (2) mapping the subtrees to the internal representation of the XDS for paging. In enterprise-scale environments with very large documents and many parallel bulkload operations, the first task is particularly challenging, as not only disk space consumption, but also CPU and main-memory usage are important factors. In this paper, we discuss the requirements for an XDS bulkload component and examine existing algorithms for tree partitioning and their applicability to the bulkload operation. We derive a new tree-partitioning algorithm for use in the bulkload operation and present the design of the bulkload component for the XDS Natix. Finally, we evaluate the performance of the bulkload component and compare our results with previous work.