Finding top-k similar graphs in graph databases

Authors:
Yuanyuan Zhu;Lu Qin;Jeffrey Xu Yu;Hong Cheng
Affiliations:
The Chinese University of Hong Kong, Hong Kong;The Chinese University of Hong Kong, Hong Kong;The Chinese University of Hong Kong, Hong Kong;The Chinese University of Hong Kong, Hong Kong
Venue:
Proceedings of the 15th International Conference on Extending Database Technology
Year:
2012

Citing 20
Cited 0

Finding a maximum clique in an arbitrary graph

SIAM Journal on Computing
A graph distance metric based on the maximal common subgraph

Pattern Recognition Letters
Algorithmics and applications of tree and graph searching

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Annealed replication: a new heuristic for the maximum clique problem

Discrete Applied Mathematics
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Cyclic pattern kernels for predictive graph mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Common subgraph isomorphism detection by backtracking search

Software—Practice & Experience
Substructure similarity search in graph databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Closure-Tree: An Index Structure for Graph Queries

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Fg-index: towards verification-free query processing on graph databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Towards graph containment search and indexing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Graph indexing: tree + delta

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A novel spectral coding in a large graph database

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Taming verification hardness: an efficient algorithm for testing subgraph isomorphism

Proceedings of the VLDB Endowment
A novel approach for efficient supergraph query processing on graph databases

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
G-hash: towards fast kernel-based similarity search in large graph databases

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Connected substructure similarity search

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
iGraph: a framework for comparisons of disk-based graph indexing techniques

Proceedings of the VLDB Endowment
Fast graph query processing with a low-cost index

The VLDB Journal — The International Journal on Very Large Data Bases
A new approach and faster exact methods for the maximum common subgraph problem

COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Querying similar graphs in graph databases has been widely studied in graph query processing in recent years. Existing works mainly focus on subgraph similarity search and supergraph similarity search. In this paper, we study the problem of finding top-k graphs in a graph database that are most similar to a query graph. This problem has many applications, such as image retrieval and chemical compound structure search. Regarding the similarity measure, feature based and kernel based similarity measures have been used in the literature. But such measures are rough and may lose the connectivity information among substructures. In this paper, we introduce a new similarity measure based on the maximum common subgraph (MCS) of two graphs. We show that this measure can better capture the common and different structures of two graphs. Since computing the MCS of two graphs is NP-hard, we propose an algorithm to answer the top-k graph similarity query using two distance lower bounds with different computational costs, in order to reduce the number of MCS computations. We further introduce an indexing technique, which can better make use of the triangle property of similarities among graphs in the database to get tighter lower bounds. Three different indexing methods are proposed with different tradeoffs between pruning power and construction cost. We conducted extensive performance studies on large real datasets to evaluate the performance of our approaches.