Graph indexing: tree + delta

  • Authors:
  • Peixiang Zhao;Jeffrey Xu Yu;Philip S. Yu

  • Affiliations:
  • The Chinese University of Hong Kong;The Chinese University of Hong Kong;IBM T. J. Watson Research Center

  • Venue:
  • VLDB '07 Proceedings of the 33rd international conference on Very large data bases
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent scientific and technological advances have witnessed an abundance of structural patterns modeled as graphs. As a result, it is of special interest to process graph containment queries effectively on large graph databases. Given a graph database G, and a query raph q, the graph containment query is to retrieve all graphs in G which contain q as subgraph(s). Due to the vast number of graphs in G and the nature of complexity for subgraph isomorphism testing, it is desirable to make use of high-quality graph indexing mechanisms to reduce the overall query processing cost. In this paper, we propose a new cost-effective graph indexing method based on frequent tree-features of the graph database. We analyze the effectiveness and efficiency of tree as indexing feature from three critical aspects: feature size, feature selection cost, and pruning power. In order to achieve better pruning ability than existing graph-based indexing methods, we select, in addition to frequent tree-features (Tree), a small number of discriminative graphs (Δ) on demand, without a costly graph mining process beforehand. Our study verifies that (Tree+Δ) is a better choice than graph for indexing purpose, denoted (Tree+Δ ≥Graph), to address the graph containment query problem. It has two implications: (1) the index construction by (Tree+Δ) is efficient, and (2) the graph containment query processing by (Tree+Δ) is efficient. Our experimental studies demonstrate that (Tree+Δ) has a compact index structure, achieves an order of magnitude better performance in index construction, and most importantly, outperforms up-to-date graph-based indexing methods: gIndex and C-Tree, in graph containment query processing.