PIXSAR: incremental reclustering of augmented XML trees

Authors:
Lila Shnaiderman;Oded Shmueli;Rajesh Bordawekar
Affiliations:
Technion, Haifa, Israel;Technion, Haifa, Israel;IBM T. J. Watson Research Center, Hawthorne, NY, USA
Venue:
Proceedings of the 10th ACM workshop on Web information and data management
Year:
2008

Citing 11
Cited 1

Effective clustering of complex objects in object-oriented databases

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
On the performance of object clustering techniques

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
A clustering algorithm for hierarchical structures

ACM Transactions on Database Systems (TODS)
Vclusters: a flexible, fine-grained object clustering mechanism

Proceedings of the 13th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Partition-Based Clustering in Object Bases: From Theory to Practice

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Deriving Program Physical Structures Using Bond Energy Algorithm

APSEC '99 Proceedings of the Sixth Asia Pacific Software Engineering Conference
A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
System RX: one part relational, one part XML

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A linear time algorithm for optimal tree sibling partitioning and approximation algorithms in Natix

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
XMark: a benchmark for XML data management

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
An algorithm for partitioning trees augmented with sibling edges

Information Processing Letters

iPIXSAR: incremental clustering of indexed XML data

Proceedings of the 2009 EDBT/ICDT Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML is one of the primary encoding schemes for data and knowledge. We investigate incremental physical data clustering in systems that store XML documents using a native format. We formulate the XML clustering problem as an augmented (with sibling edges) tree partitioning problem and propose the PIXSAR (Practical Incremental XML Sibling Augmented Reclustering) algorithm for incrementally clustering XML documents. We show the fundamental importance of workload-driven dynamically rearranging storage. PIXSAR incrementally executes reclustering operations on selected subgraphs of the global augmented document tree. The subgraphs are implied by significant changes in the workload. As the workload changes, PIXSAR incrementally djusts the XML data layout so as to better fit the workload. PIXSAR's main parameters are the radius, in pages, of the augmented portion to be reclustered and the way reclustering is triggered. We briefly explore some of the effects of indexes; a full treatment of indexes is the subject of another paper. We use an experimental data clustering system that includes a fast disk simulator and File System simulator for storing native XML data. We use a novel method for 'exporting' the Saxon query processor into our setting. Experimental results indicate that using PIXSAR significantly reduces the number of page faults (counting ALL page faults incurred while querying the document as well as maintenance operations) thereby resulting in improved query performance.