Indexing graph-structured XML data for efficient structural join operation

  • Authors:
  • Qun Chen;Andrew Lim;Kian Win Ong;Ji Qing Tang

  • Affiliations:
  • Department of Industrial Engineering and Engineering Management, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong;Department of Industrial Engineering and Engineering Management, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong;Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA;Department of Industrial Engineering and Engineering Management, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Structural join has been established as a primitive technique for matching the binary containment pattern, specifically the parent-child and ancestor-descendant relationship, on the tree XML data. While current indexing approaches and evaluation algorithms proposed for the structural join operation assume the tree-structured data model, the presence of reference links in XML documents may render the underlying model a graph instead. In the more general category of semi-structured data, of which XML is an example, the data model is also usually supposed to be of graph structure. In this paper, we present an indexing approach and corresponding evaluation algorithms for efficiently performing the structural join operation on graph-structured data. Our approach encodes the structural containment relationship of a graph on multiple nested tree-structured layers, probably with the exception of the last one. With each tree-structured layer indexed with the inverted technique, the structural join operation on a graph can therefore be accomplished through recursively performing structural joins on nested layer trees. Our extensive experiments on both benchmark and synthetic XML data indicate that our proposed approach has good potential to perform significantly better than existing ones in term of both the I/O and CPU cost.