Substructure similarity search in graph databases

Authors:
Xifeng Yan;Philip S. Yu;Jiawei Han
Affiliations:
University of Illinois at Urbana-Champaign;IBM T. J. Watson Research Center;University of Illinois at Urbana-Champaign
Venue:
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Year:
2005

Citing 15
Cited 59

Principles of artificial intelligence

Principles of artificial intelligence
Approximate string-matching with q-grams and maximal matches

Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Approximation algorithms for NP-hard problems

Approximation algorithms for NP-hard problems
A New Algorithm for Error-Tolerant Subgraph Isomorphism Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
A threshold of ln n for approximating set cover

Journal of the ACM (JACM)
A graph distance metric based on the maximal common subgraph

Pattern Recognition Letters
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Efficient Matching and Indexing of Graph Models in Content-Based Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence - Graph Algorithms and Computer Vision
Algorithmics and applications of tree and graph searching

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
A System for Approximate Tree Matching

IEEE Transactions on Knowledge and Data Engineering
Similarity Searching in Medical Image Databases

IEEE Transactions on Knowledge and Data Engineering
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A platform based on the multi-dimensional data modal for analysis of bio-molecular structures

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

FIX: feature-based indexing technique for XML documents

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Feature-based similarity search in graph structures

ACM Transactions on Database Systems (TODS)
Maximal Biclique Subgraphs and Closed Pattern Pairs of the Adjacency Matrix: A One-to-One Correspondence and Mining Algorithms

IEEE Transactions on Knowledge and Data Engineering
Towards graph containment search and indexing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Substructure similarity measurement in chinese recipes

Proceedings of the 17th international conference on World Wide Web
RAM: Randomized Approximate Graph Mining

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Taming verification hardness: an efficient algorithm for testing subgraph isomorphism

Proceedings of the VLDB Endowment
Periscope/GQ: a graph querying toolkit

Proceedings of the VLDB Endowment
Efficient query processing on graph databases

ACM Transactions on Database Systems (TODS)
GADDI: distance index based subgraph matching in biological networks

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Flexible query answering on graph-modeled data

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Top-K Correlation Sub-graph Search in Graph Databases

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
3-HOP: a high-compression indexing scheme for reachability query

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Reasoning about designs through frequent patterns mining

Advanced Engineering Informatics
Independent informative subgraph mining for graph information retrieval

Proceedings of the 18th ACM conference on Information and knowledge management
Large-scale malware indexing using function-call graphs

Proceedings of the 16th ACM conference on Computer and communications security
GConnect: a connectivity index for massive disk-resident graphs

Proceedings of the VLDB Endowment
Finding the k-Most Abnormal Subgraphs from a Single Graph

DS '09 Proceedings of the 12th International Conference on Discovery Science
Fast computation of SimRank for static and dynamic information networks

Proceedings of the 13th International Conference on Extending Database Technology
GBLENDER: towards blending visual query formulation and query processing in graph databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Connected substructure similarity search

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Querying graphs with uncertain predicates

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Parallel SimRank computation on large graphs with iterative aggregation

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
DSI: a method for indexing large graphs using distance set

WAIM'10 Proceedings of the 11th international conference on Web-age information management
iGraph: a framework for comparisons of disk-based graph indexing techniques

Proceedings of the VLDB Endowment
Graph homomorphism revisited for graph matching

Proceedings of the VLDB Endowment
Fast business process similarity search with feature-based similarity estimation

OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems - Volume Part I
A tool for fast indexing and querying of graphs

Proceedings of the 20th international conference companion on World wide web
On nonmetric similarity search problems in complex domains

ACM Computing Surveys (CSUR)
Neighborhood based fast graph search in large networks

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Aggregated search in graph databases: preliminary results

GbRPR'11 Proceedings of the 8th international conference on Graph-based representations in pattern recognition
Using kernels on hierarchical graphs in automatic classification of designs

GbRPR'11 Proceedings of the 8th international conference on Graph-based representations in pattern recognition
Genetic selection of subgraphs for automatic reasoning in design systems

HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part I
BR-index: an indexing structure for subgraph matching in very large dynamic graphs

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Efficient retrieval of similar business process models based on structure

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part I
On querying OBO ontologies using a DAG pattern query language

DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences
Answering pattern match queries in large graph databases via graph embedding

The VLDB Journal — The International Journal on Very Large Data Bases
Querying large graph databases

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
Indexing and mining of graph database based on interconnected subgraph

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Fast business process similarity search

Distributed and Parallel Databases
TreeSpan: efficiently computing similarity all-matching

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Finding top-k similar graphs in graph databases

Proceedings of the 15th International Conference on Extending Database Technology
Indexing and mining topological patterns for drug discovery

Proceedings of the 15th International Conference on Extending Database Technology
A relational-based approach for aggregated search in graph databases

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Efficient subgraph similarity all-matching

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Efficient subgraph matching on billion node graphs

Proceedings of the VLDB Endowment
Efficient subgraph similarity search on large probabilistic graph databases

Proceedings of the VLDB Endowment
Review of bisonet abstraction techniques

Bisociative Knowledge Discovery
Efficient algorithms for generalized subgraph query processing

Proceedings of the 21st ACM international conference on Information and knowledge management
Comparing and fusing terrain network information

SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
Top-k Similar Graph Matching Using TraM in Biological Networks

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A Social Bookmarking-Based People Search Service Building Communities of Practice with Collective Intelligence

International Journal of Organizational and Collective Intelligence
Compressed feature-based filtering and verification approach for subgraph search

Proceedings of the 16th International Conference on Extending Database Technology
Efficient breadth-first search on large graphs with skewed degree distributions

Proceedings of the 16th International Conference on Extending Database Technology
Streaming similarity search over one billion tweets using parallel locality-sensitive hashing

Proceedings of the VLDB Endowment
Facilitating representation and retrieval of structured cases: Principles and toolkit

Information Systems
Efficient processing of graph similarity queries with edit distance constraints

The VLDB Journal — The International Journal on Very Large Data Bases
Querying business process model repositories

World Wide Web
Subquery plan reuse based query optimization

Proceedings of the 17th International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Advanced database systems face a great challenge raised by the emergence of massive, complex structural data in bioinformatics, chem-informatics, and many other applications. The most fundamental support needed in these applications is the efficient search of complex structured data. Since exact matching is often too restrictive, similarity search of complex structures becomes a vital operation that must be supported efficiently.In this paper, we investigate the issues of substructure similarity search using indexed features in graph databases. By transforming the edge relaxation ratio of a query graph into the maximum allowed missing features, our structural filtering algorithm, called Grafil, can filter many graphs without performing pairwise similarity computations. It is further shown that using either too few or too many features can result in poor filtering performance. Thus the challenge is to design an effective feature set selection strategy for filtering. By examining the effect of different feature selection mechanisms, we develop a multi-filter composition strategy, where each filter uses a distinct and complementary subset of the features. We identify the criteria to form effective feature sets for filtering, and demonstrate that combining features with similar size and selectivity can improve the filtering and search performance significantly. Moreover, the concept presented in Grafil can be applied to searching approximate non-consecutive sequences, trees, and other complicated structures as well.