Multiresolution Indexing of XML for Frequent Queries

  • Authors:
  • Hao He;Jun Yang

  • Affiliations:
  • -;-

  • Venue:
  • ICDE '04 Proceedings of the 20th International Conference on Data Engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML and other types of semi-structured data are typicallyrepresented by a labeled directed graph. To speedup path expression queries over the graph, a variety ofstructural indexes have been proposed. They usually workby partitioning nodes in the data graph into equivalenceclasses and storing equivalence classes as index nodes.A(k)-index introduces the concept of local bisimilarity forpartitioning, allowing the trade-off between index size andquery answering power. However, all index nodes in A(k)-indexhave the same local similarity k, which cannot takeadvantage of the fact that a workload may contain path expressionsof different lengths, or that different parts of thedata graph may have different local similarity requirements.To overcome these limitations, we propose M(k)- andM*(k)-indexes. The basic M(k)-index is workload-aware:Like the previously proposed D(k)-index, it allows differentindex nodes to have different local similarity requirements,providing finer partitioning only for parts of the datagraph targeted by longer path expressions. Unlike D(k)-index,M(k)-index is never over-refined for irrelevant indexor data nodes. However, the workload-aware featurestill incurs overrefinement due to over-qualified parent indexnodes. Moreover, fine partitions penalize the performanceof short path expressions. To solve these problems,we further propose the M*(k)-index. An M*(k)-index consistsof a collection of indexes whose nodes are organizedin a partition hierarchy, allowing successively coarser partitioninginformation to co-exist with the finest partitioninginformation required. Experiments show that our indexesare superior to previously proposed indexes in terms of indexsize and query performance.