iPIXSAR: incremental clustering of indexed XML data

  • Authors:
  • Lila Shnaiderman;Oded Shmueli

  • Affiliations:
  • Technion, Haifa, Israel;Technion, Haifa, Israel

  • Venue:
  • Proceedings of the 2009 EDBT/ICDT Workshops
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML is one of the primary encoding schemes for data and knowledge. We investigate incremental physical data clustering for native XML data. We view the XML clustering problem as a sibling-augmented tree partitioning problem. In previous work we suggested an incremental algorithm that, based on workload, adjusts the storage structure so as to minimize page faults [24]. However, in addition to the XML augmented tree, there are also indices. The kind of index we consider is based on a XPath expression and it consists of index entries pointing to XML target nodes. Using such index entries one "jumps" directly to target nodes. Often, target XML nodes are accessed in temporal proximity and hence, for paging reasons, it is beneficial to store them on the same disk page. In other cases, such temporal proximity is absent and hence co-storing is not optimal. Designing an algorithm that views the XML data and indices as a sibling augmented tree with multiple roots (the additional roots correspond to indices) is complex. In this work we propose an extension to the PIXSAR algorithm, called iPIXSAR, which extends PIXSAR so as to make storing decisions of target XML nodes based on possible membership in more than one tree. We use an experimental data clustering system that includes a disk and File System simulator. Instead of implementing a query processor, we "record" logs of Saxon runs and emulate them in our system. To make experimentation feasible, we constructed a disk, simulator which operates in main memory. In [24], we compared PIXSAR to DFS (an efficient static storage scheme), on a simulated and on a real physical disk. We found that PIXSAR results are better than DFS by 20% to 50% on a simulated disk and usually by 60% on a real disk. In this work we experimentally show that in the presence of indices iPIXSAR is superior to PIXSAR by up to 8%.