Mining and indexing graphs for supergraph search

  • Authors:
  • Dayu Yuan;Prasenjit Mitra;C. Lee Giles

  • Affiliations:
  • Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA;Department of Computer Science and Engineering and College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA;Department of Computer Science and Engineering and College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study supergraph search (SPS), that is, given a query graph q and a graph database G that contains a collection of graphs , return graphs that have q as a supergraph from G. SPS has broad applications in bioinformatics, cheminformatics and other scientific and commercial fields. Determining whether a graph is a subgraph (or supergraph) of another is an NP-complete problem. Hence, it is intractable to compute SPS for large graph databases. Two separate indexing methods, a "filter + verify"-based method and a "prefix-sharing"-based method, have been studied to efficiently compute SPS. To implement the above two methods, subgraph patterns are mined from the graph database to build an index. Those subgraphs are mined to optimize either the filtering gain or the prefix-sharing gain. However, no single subgraph-mining algorithm considers both gains. This work is the first one to mine subgraphs to optimize both the filtering gain and the prefix-sharing gain while processing SPS queries. First, we show that the subgraph-mining problem is NP-hard. Then, we propose two polynomial-time algorithms to solve the problem with an approximation ratio of 1-1/e and 1/4 respectively. In addition, we construct a lattice-like index, LW-index, to organize the selected subgraph patterns for fast index-lookup. Our experiments show that our approach improves the query processing time for SPS queries by a factor of 3 to 10.