Indexing graph-structured XML data for efficient structural join operation

Authors:
Qun Chen;Andrew Lim;Kian Win Ong;Ji Qing Tang
Affiliations:
Department of Industrial Engineering and Engineering Management, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong;Department of Industrial Engineering and Engineering Management, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong;Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA;Department of Industrial Engineering and Engineering Management, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
Venue:
Data & Knowledge Engineering
Year:
2006

Citing 18
Cited 1

Optimizing queries on files

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Algebras for querying text regions (extended abstract)

PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A query language for XML

WWW '99 Proceedings of the eighth international conference on World Wide Web
Data on the Web: from relations to semistructured data and XML

Data on the Web: from relations to semistructured data and XML
On supporting containment queries in relational database management systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Covering indexes for branching path queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Optimizing Regular Path Expressions Using Graph Schemas

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Querying Semi-Structured Data

ICDT '97 Proceedings of the 6th International Conference on Database Theory
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Quilt: An XML Query Language for Heterogeneous Data Sources

Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
D(k)-index: an adaptive structural summary for graph-structured data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Structural Joins: A Primitive for Efficient XML Query Pattern Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Incremental maintenance of XML structural indexes

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Efficient structural joins on indexed XML documents

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Structure and value synopses for XML data graphs

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Holistic twig joins on indexed XML documents

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Hierarchical clustering of XML documents focused on structural components

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Structural join has been established as a primitive technique for matching the binary containment pattern, specifically the parent-child and ancestor-descendant relationship, on the tree XML data. While current indexing approaches and evaluation algorithms proposed for the structural join operation assume the tree-structured data model, the presence of reference links in XML documents may render the underlying model a graph instead. In the more general category of semi-structured data, of which XML is an example, the data model is also usually supposed to be of graph structure. In this paper, we present an indexing approach and corresponding evaluation algorithms for efficiently performing the structural join operation on graph-structured data. Our approach encodes the structural containment relationship of a graph on multiple nested tree-structured layers, probably with the exception of the last one. With each tree-structured layer indexed with the inverted technique, the structural join operation on a graph can therefore be accomplished through recursively performing structural joins on nested layer trees. Our extensive experiments on both benchmark and synthetic XML data indicate that our proposed approach has good potential to perform significantly better than existing ones in term of both the I/O and CPU cost.