iPIXSAR: incremental clustering of indexed XML data

Authors:
Lila Shnaiderman;Oded Shmueli
Affiliations:
Technion, Haifa, Israel;Technion, Haifa, Israel
Venue:
Proceedings of the 2009 EDBT/ICDT Workshops
Year:
2009

Citing 14
Cited 0

Effective clustering of complex objects in object-oriented databases

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
On the performance of object clustering techniques

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
A clustering algorithm for hierarchical structures

ACM Transactions on Database Systems (TODS)
Optimal Sequential Partitions of Graphs

Journal of the ACM (JACM)
Vclusters: a flexible, fine-grained object clustering mechanism

Proceedings of the 13th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Partition-Based Clustering in Object Bases: From Theory to Practice

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Deriving Program Physical Structures Using Bond Energy Algorithm

APSEC '99 Proceedings of the Sixth Asia Pacific Software Engineering Conference
A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
System RX: one part relational, one part XML

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
XML Document Indexes: A Classification

IEEE Internet Computing
A linear time algorithm for optimal tree sibling partitioning and approximation algorithms in Natix

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
An algorithm for partitioning trees augmented with sibling edges

Information Processing Letters
PIXSAR: incremental reclustering of augmented XML trees

Proceedings of the 10th ACM workshop on Web information and data management
Efficient algorithm for the partitioning of trees

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML is one of the primary encoding schemes for data and knowledge. We investigate incremental physical data clustering for native XML data. We view the XML clustering problem as a sibling-augmented tree partitioning problem. In previous work we suggested an incremental algorithm that, based on workload, adjusts the storage structure so as to minimize page faults [24]. However, in addition to the XML augmented tree, there are also indices. The kind of index we consider is based on a XPath expression and it consists of index entries pointing to XML target nodes. Using such index entries one "jumps" directly to target nodes. Often, target XML nodes are accessed in temporal proximity and hence, for paging reasons, it is beneficial to store them on the same disk page. In other cases, such temporal proximity is absent and hence co-storing is not optimal. Designing an algorithm that views the XML data and indices as a sibling augmented tree with multiple roots (the additional roots correspond to indices) is complex. In this work we propose an extension to the PIXSAR algorithm, called iPIXSAR, which extends PIXSAR so as to make storing decisions of target XML nodes based on possible membership in more than one tree. We use an experimental data clustering system that includes a disk and File System simulator. Instead of implementing a query processor, we "record" logs of Saxon runs and emulate them in our system. To make experimentation feasible, we constructed a disk, simulator which operates in main memory. In [24], we compared PIXSAR to DFS (an efficient static storage scheme), on a simulated and on a real physical disk. We found that PIXSAR results are better than DFS by 20% to 50% on a simulated disk and usually by 60% on a real disk. In this work we experimentally show that in the presence of indices iPIXSAR is superior to PIXSAR by up to 8%.