Fast structural join with a location function

Authors:
Nan Tang;Jeffrey Xu Yu;Kam-Fai Wong;Haifeng Jiang
Affiliations:
The Chinese University of Hong Kong, Hong Kong, China;The Chinese University of Hong Kong, Hong Kong, China;The Chinese University of Hong Kong, Hong Kong, China;IBM Almaden Research Center, San Jose
Venue:
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Year:
2006

Citing 2
Cited 2

Structural Joins: A Primitive for Efficient XML Query Pattern Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Efficient structural joins on indexed XML documents

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Extending path summary and region encoding for efficient structural query processing in native XML databases

Journal of Systems and Software
Cost based plan selection for xpath

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

Quantified Score

Hi-index	0.02

Visualization

Abstract

A structural join evaluates structural relationship (parent-child or ancestor-descendant) between xml elements. It serves as an important computation unit in xml pattern matching, such as twig joins. There exists many work on efficient structural joins. In particular, indexes can expedite structural joins by skipping unmatchable elements. A typical use of indexes is to retrieve, for a given element, all its ancestor (or descendant) elements from an indexed set. However we observed two possible limitations with such index probes, namely false hit and false locate. A false hit means that an index probe touches unnecessary data besides real results; a false locate stands for a (wasted) probe that has zero answers. Obviously false hit and false locate can affect negatively the efficiency of structural joins. In this paper, we challenge ourselves to develop new structural join algorithm with no false hit and no false locate. We illustrate that R−Tree has the no false hit property (in contrast to B+-Tree) and hence is a good candidate for our algorithm. For no false locate, we propose a new function called Location which tells the probing points that will result in matches. We design and implement the Location function using a space-efficient structure, and present our algorithm using R−Tree together with the Location function. Extensive experiments show the efficiency of our algorithm.