Efficient processing of XML twig patterns with parent child edges: a look-ahead approach

  • Authors:
  • Jiaheng Lu;Ting Chen;Tok Wang Ling

  • Affiliations:
  • National University of Singapore, Singapore;National University of Singapore, Singapore;National University of Singapore, Singapore

  • Venue:
  • Proceedings of the thirteenth ACM international conference on Information and knowledge management
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the growing importance of semi-structure data in information exchange, much research has been done to provide an effective mechanism to match a twig query in an XML database. A number of algorithms have been proposed recently to process a twig query holistically. Those algorithms are quite efficient for quires with only ancestor-descendant edges. But for queries with mixed ancestor-descendant and parent-child edges, the previous approaches still may produce large intermediate results, even when the input and output size are more manageable. To overcome this limitation, in this paper, we propose a novel holistic twig join algorithm, namely TwigStackList. Our main technique is to look-ahead read some elements in input data steams and cache limited number of them to lists in the main memory. The number of elements in any list is bounded by the length of the longest path in the XML document. We show that TwigStackList is I/O optimal for queries with only ancestor-descendant relationships below branching nodes. Further, even when queries contain parent-child relationship below branching nodes, the set of intermediate results in TwigStackList is guaranteed to be a subset of that in previous algorithms. We complement our experimental results on a range of real and synthetic data to show the significant superiority of TwigStackList over previous algorithms for queries with parent-child relationships.