Mining and indexing graphs for supergraph search

Authors:
Dayu Yuan;Prasenjit Mitra;C. Lee Giles
Affiliations:
Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA;Department of Computer Science and Engineering and College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA;Department of Computer Science and Engineering and College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA
Venue:
Proceedings of the VLDB Endowment
Year:
2013

Citing 15
Cited 0

Exploiting statistics on query expressions for optimization

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Fg-index: towards verification-free query processing on graph databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Towards graph containment search and indexing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Taming verification hardness: an efficient algorithm for testing subgraph isomorphism

Proceedings of the VLDB Endowment
Efficient query processing on graph databases

ACM Transactions on Database Systems (TODS)
A novel approach for efficient supergraph query processing on graph databases

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Independent informative subgraph mining for graph information retrieval

Proceedings of the 18th ACM conference on Information and knowledge management
Technical Section: Automatic discovery of common design structures in CAD models

Computers and Graphics
PrefIndex: an efficient supergraph containment search technique

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
iGraph: a framework for comparisons of disk-based graph indexing techniques

Proceedings of the VLDB Endowment
Malware analysis with tree automata inference

CAV'11 Proceedings of the 23rd international conference on Computer aided verification
Iterative Graph Feature Mining for Graph Indexing

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Lindex: a lattice-based index for graph databases

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study supergraph search (SPS), that is, given a query graph q and a graph database G that contains a collection of graphs , return graphs that have q as a supergraph from G. SPS has broad applications in bioinformatics, cheminformatics and other scientific and commercial fields. Determining whether a graph is a subgraph (or supergraph) of another is an NP-complete problem. Hence, it is intractable to compute SPS for large graph databases. Two separate indexing methods, a "filter + verify"-based method and a "prefix-sharing"-based method, have been studied to efficiently compute SPS. To implement the above two methods, subgraph patterns are mined from the graph database to build an index. Those subgraphs are mined to optimize either the filtering gain or the prefix-sharing gain. However, no single subgraph-mining algorithm considers both gains. This work is the first one to mine subgraphs to optimize both the filtering gain and the prefix-sharing gain while processing SPS queries. First, we show that the subgraph-mining problem is NP-hard. Then, we propose two polynomial-time algorithms to solve the problem with an approximation ratio of 1-1/e and 1/4 respectively. In addition, we construct a lattice-like index, LW-index, to organize the selected subgraph patterns for fast index-lookup. Our experiments show that our approach improves the query processing time for SPS queries by a factor of 3 to 10.