A linear time algorithm for optimal tree sibling partitioning and approximation algorithms in Natix

Authors:
Carl-Christian Kanne;Guido Moerkotte
Affiliations:
Department of Mathematics and Computer Science, University of Mannheim;Department of Mathematics and Computer Science, University of Mannheim
Venue:
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Year:
2006

Citing 8
Cited 11

On the performance of object clustering techniques

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
A clustering algorithm for hierarchical structures

ACM Transactions on Database Systems (TODS)
Anatomy of a native XML base management system

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient Storage of XML Data

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
System RX: one part relational, one part XML

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
The importance of sibling clustering for efficient bulkload of XML document trees

IBM Systems Journal
XMark: a benchmark for XML data management

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
XPathMark: an XPath benchmark for the XMark generated data

XSym'05 Proceedings of the Third international conference on Database and XML Technologies

XML design for relational storage

Proceedings of the 16th international conference on World Wide Web
An algorithm for partitioning trees augmented with sibling edges

Information Processing Letters
PIXSAR: incremental reclustering of augmented XML trees

Proceedings of the 10th ACM workshop on Web information and data management
2LP: A double-lazy XML parser

Information Systems
Storing semi-structured data on disk drives

ACM Transactions on Storage (TOS)
iPIXSAR: incremental clustering of indexed XML data

Proceedings of the 2009 EDBT/ICDT Workshops
A tight bound on the min-ratio edge-partitioning problem of a tree

Discrete Applied Mathematics
A content-aware adaptive storage approach for XML in PXRDB

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
OXDP & OXiP: the notion of objects for efficient large XML data queries

International Journal of Grid and Utility Computing
Partial Evaluation for Distributed XPath Query Processing and Beyond

ACM Transactions on Database Systems (TODS)
Beyond lazy XML parsing

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.01

Visualization

Abstract

Document insertion into a native XML Data Store (XDS) requires to partition the document tree into a number of storage units with limited capacity, such as records on disk pages. As intra partition navigation is much faster than navigation between partitions, minimizing the number of partitions has a beneficial effect on query performance.We present a linear time algorithm to optimally partition an ordered, labeled, weighted tree such that each partition does not exceed a fixed weight limit. Whereas traditionally tree partitioning algorithms only allow child nodes to share a partition with their parent node (i.e. a partition corresponds to a subtree), our algorithm also considers partitions containing several subtrees as long as their roots are adjacent siblings. We call this sibling partitioning.Based on our study of the optimal algorithm, we further introduce two novel, near-optimal heuristics. They are easier to implement, do not need to hold the whole document instance in memory, and require much less runtime than the optimal algorithm.Finally, we provide an experimental study comparing our novel and existing algorithms. One important finding is that compared to partitioning that exclusively considers parent-child partitions, including sibling partitioning as well can decrease the total number of partitions by more than 90%, and improve query performance by more than a factor of two.