Connected substructure similarity search

Authors:
Haichuan Shang;Xuemin Lin;Ying Zhang;Jeffrey Xu Yu;Wei Wang
Affiliations:
University of New South Wales and NICTA, Sydney, Australia;University of New South Wales and NICTA, Sydney, Australia;University of New South Wales, Sydney, Australia;Chinese University of Hong Kong, HongKong, China;University of New South Wales and NICTA, Sydney, Australia
Venue:
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Year:
2010

Citing 20
Cited 10

Finding a maximum clique in an arbitrary graph

SIAM Journal on Computing
Algorithm 457: finding all cliques of an undirected graph

Communications of the ACM
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Annealed replication: a new heuristic for the maximum clique problem

Discrete Applied Mathematics
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
A Comparison of Algorithms for Maximum Common Subgraph on Randomly Connected Graphs

Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A quickstart in frequent structure mining can make a difference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Common subgraph isomorphism detection by backtracking search

Software—Practice & Experience
Substructure similarity search in graph databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Closure-Tree: An Index Structure for Graph Queries

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Fg-index: towards verification-free query processing on graph databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Towards graph containment search and indexing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Graph indexing: tree + delta

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A novel spectral coding in a large graph database

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Taming verification hardness: an efficient algorithm for testing subgraph isomorphism

Proceedings of the VLDB Endowment
Comparing stars: on approximating graph edit distance

Proceedings of the VLDB Endowment
A comparison of three maximum common subgraph algorithms on a large database of labeled graphs

GbRPR'03 Proceedings of the 4th IAPR international conference on Graph based representations in pattern recognition

A tool for fast indexing and querying of graphs

Proceedings of the 20th international conference companion on World wide web
GBLENDER: visual subgraph query formulation meets query processing

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
TreeSpan: efficiently computing similarity all-matching

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Finding top-k similar graphs in graph databases

Proceedings of the 15th International Conference on Extending Database Technology
Indexing and mining topological patterns for drug discovery

Proceedings of the 15th International Conference on Extending Database Technology
Efficient subgraph similarity all-matching

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Efficient algorithms for generalized subgraph query processing

Proceedings of the 21st ACM international conference on Information and knowledge management
Efficient breadth-first search on large graphs with skewed degree distributions

Proceedings of the 16th International Conference on Extending Database Technology
Towards semantic comparison of multi-granularity process traces

Knowledge-Based Systems
Efficient processing of graph similarity queries with edit distance constraints

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Substructure similarity search is to retrieve graphs that approximately contain a given query graph. It has many applications, e.g., detecting similar functions among chemical compounds. The problem is challenging as even testing subgraph containment between two graphs is NP-complete. Hence, existing techniques adopt the filtering-and-verification framework with the focus on developing effective and efficient techniques to remove non-promising graphs. Nevertheless, existing filtering techniques may be still unable to effectively remove many "low" quality candidates. To resolve this, in this paper we propose a novel indexing technique, GrafD-Index, to index graphs according to their "distances" to features. We characterize a tight condition under which the distance-based triangular inequality holds. We then develop lower and upper bounding techniques that exploit the GrafD-Index to (1) prune non-promising graphs and (2) include graphs whose similarities are guaranteed to exceed the given similarity threshold. Considering that the verification phase is not well studied and plays the dominant role in the whole process, we devise efficient algorithms to verify candidates. A comprehensive experiment using real datasets demonstrates that our proposed methods significantly outperform existing methods.