On the performance of object clustering techniques
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
A clustering algorithm for hierarchical structures
ACM Transactions on Database Systems (TODS)
Anatomy of a native XML base management system
The VLDB Journal — The International Journal on Very Large Data Bases
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
System RX: one part relational, one part XML
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
XMark: a benchmark for XML data management
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
XPathMark: an XPath benchmark for the XMark generated data
XSym'05 Proceedings of the Third international conference on Database and XML Technologies
XML design for relational storage
Proceedings of the 16th international conference on World Wide Web
An algorithm for partitioning trees augmented with sibling edges
Information Processing Letters
PIXSAR: incremental reclustering of augmented XML trees
Proceedings of the 10th ACM workshop on Web information and data management
Information Systems
Storing semi-structured data on disk drives
ACM Transactions on Storage (TOS)
iPIXSAR: incremental clustering of indexed XML data
Proceedings of the 2009 EDBT/ICDT Workshops
A tight bound on the min-ratio edge-partitioning problem of a tree
Discrete Applied Mathematics
A content-aware adaptive storage approach for XML in PXRDB
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
OXDP & OXiP: the notion of objects for efficient large XML data queries
International Journal of Grid and Utility Computing
Partial Evaluation for Distributed XPath Query Processing and Beyond
ACM Transactions on Database Systems (TODS)
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Hi-index | 0.01 |
Document insertion into a native XML Data Store (XDS) requires to partition the document tree into a number of storage units with limited capacity, such as records on disk pages. As intra partition navigation is much faster than navigation between partitions, minimizing the number of partitions has a beneficial effect on query performance.We present a linear time algorithm to optimally partition an ordered, labeled, weighted tree such that each partition does not exceed a fixed weight limit. Whereas traditionally tree partitioning algorithms only allow child nodes to share a partition with their parent node (i.e. a partition corresponds to a subtree), our algorithm also considers partitions containing several subtrees as long as their roots are adjacent siblings. We call this sibling partitioning.Based on our study of the optimal algorithm, we further introduce two novel, near-optimal heuristics. They are easier to implement, do not need to hold the whole document instance in memory, and require much less runtime than the optimal algorithm.Finally, we provide an experimental study comparing our novel and existing algorithms. One important finding is that compared to partitioning that exclusively considers parent-child partitions, including sibling partitioning as well can decrease the total number of partitions by more than 90%, and improve query performance by more than a factor of two.