Effective clustering of complex objects in object-oriented databases
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
On the performance of object clustering techniques
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
A clustering algorithm for hierarchical structures
ACM Transactions on Database Systems (TODS)
Vclusters: a flexible, fine-grained object clustering mechanism
Proceedings of the 13th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Partition-Based Clustering in Object Bases: From Theory to Practice
FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Deriving Program Physical Structures Using Bond Energy Algorithm
APSEC '99 Proceedings of the Sixth Asia Pacific Software Engineering Conference
A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
System RX: one part relational, one part XML
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A linear time algorithm for optimal tree sibling partitioning and approximation algorithms in Natix
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
XMark: a benchmark for XML data management
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
An algorithm for partitioning trees augmented with sibling edges
Information Processing Letters
iPIXSAR: incremental clustering of indexed XML data
Proceedings of the 2009 EDBT/ICDT Workshops
Hi-index | 0.00 |
XML is one of the primary encoding schemes for data and knowledge. We investigate incremental physical data clustering in systems that store XML documents using a native format. We formulate the XML clustering problem as an augmented (with sibling edges) tree partitioning problem and propose the PIXSAR (Practical Incremental XML Sibling Augmented Reclustering) algorithm for incrementally clustering XML documents. We show the fundamental importance of workload-driven dynamically rearranging storage. PIXSAR incrementally executes reclustering operations on selected subgraphs of the global augmented document tree. The subgraphs are implied by significant changes in the workload. As the workload changes, PIXSAR incrementally djusts the XML data layout so as to better fit the workload. PIXSAR's main parameters are the radius, in pages, of the augmented portion to be reclustered and the way reclustering is triggered. We briefly explore some of the effects of indexes; a full treatment of indexes is the subject of another paper. We use an experimental data clustering system that includes a fast disk simulator and File System simulator for storing native XML data. We use a novel method for 'exporting' the Saxon query processor into our setting. Experimental results indicate that using PIXSAR significantly reduces the number of page faults (counting ALL page faults incurred while querying the document as well as maintenance operations) thereby resulting in improved query performance.