TwigBuffer: avoiding useless intermediate solutions completely in twig joins

  • Authors:
  • Jiang Li;Junhu Wang

  • Affiliations:
  • School of Information and Communication Technology, Griffith University, Gold Coast, Australia;School of Information and Communication Technology, Griffith University, Gold Coast, Australia

  • Venue:
  • DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Twig pattern matching plays a crucial role in XML data processing. TwigStack [2] is a holistic twig join algorithm that solves the problem in two steps: (1) finding potentially useful intermediate path solutions, (2) merging the intermediate solutions. The algorithm is optimal when the twig pattern has only //-edges, in the sense that no useless partial solutions are generated in the first step (thus expediting the second step and boosting the overall performance). However, when /-edges are present, a large set of useless partial solutions may be produced, which directly downgrades the overall performance. Recently, some improved versions of the algorithm (e.g., TwigStackList and iTwigJoin) have been proposed in an attempt to reduce the number of useless partial solutions when /-edges are involved. However, none of the algorithms can avoid useless partial solutions completely. In this paper, we propose a new algorithm, TwigBuffer, that is guaranteed to completely avoid the useless partial solutions. Our algorithm is based on a novel strategy to buffer and manipulate elements in stacks and lists. Experiments show that TwigBuffer significantly outperforms previous algorithms when arbitrary /-edges are present.