Pattern tree algebras: sets or sequences?

  • Authors:
  • Stelios Paparizos;H. V. Jagadish

  • Affiliations:
  • University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI

  • Venue:
  • VLDB '05 Proceedings of the 31st international conference on Very large data bases
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML and XQuery semantics are very sensitive to the order of the produced output. Although pattern-tree based algebraic approaches are becoming more and more popular for evaluating XML, there is no universally accepted technique which can guarantee both a correct output order and a choice of efficient alternative plans.We address the problem using hybrid collections of trees that can be either sets or sequences or something in between. Each such collection is coupled with an Ordering Specification that describes how the trees are sorted (full, partial or no order). This provides us with a formal basis for developing a query plan having parts that maintain no order and parts with partial or full order.It turns out that duplicate elimination introduces some of the same issues as order maintenance: it is expensive and a single collection type does not always provide all the flexibility required to optimize this properly. To solve this problem we associate with each hybrid collection a Duplicate Specification that describes the presence or absence of duplicate elements in it. We show how to extend an existing bulk tree algebra, TLC [12], to use Ordering and Duplicate specifications and produce correctly ordered results. We also suggest some optimizations enabled by the flexibility of our approach, and experimentally demonstrate the performance increase due to them.