Feature-based similarity search in graph structures

Authors:
Xifeng Yan;Feida Zhu;Philip S. Yu;Jiawei Han
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;IBM T. J. Watson Research Center, Hawthorne, NY;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2006

Citing 17
Cited 13

Principles of artificial intelligence

Principles of artificial intelligence
Approximate string-matching with q-grams and maximal matches

Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Approximation algorithms for NP-hard problems

Approximation algorithms for NP-hard problems
A New Algorithm for Error-Tolerant Subgraph Isomorphism Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
A threshold of ln n for approximating set cover

Journal of the ACM (JACM)
A graph distance metric based on the maximal common subgraph

Pattern Recognition Letters
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Efficient Matching and Indexing of Graph Models in Content-Based Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence - Graph Algorithms and Computer Vision
Algorithmics and applications of tree and graph searching

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
A System for Approximate Tree Matching

IEEE Transactions on Knowledge and Data Engineering
Similarity Searching in Medical Image Databases

IEEE Transactions on Knowledge and Data Engineering
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Lower bounds for fundamental geometric problems

Lower bounds for fundamental geometric problems
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Substructure similarity search in graph databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A platform based on the multi-dimensional data modal for analysis of bio-molecular structures

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Extraction and search of chemical formulae in text documents on the web

Proceedings of the 16th international conference on World Wide Web
Correlation search in graph databases

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining, indexing, and searching for textual chemical molecule information on the web

Proceedings of the 17th international conference on World Wide Web
On incremental maintenance of 2-hop labeling of graphs

Proceedings of the 17th international conference on World Wide Web
Structure-based graph distance measures of high degree of precision

Pattern Recognition
Learning to rank graphs for online similar graph search

Proceedings of the 18th ACM conference on Information and knowledge management
Comparing stars: on approximating graph edit distance

Proceedings of the VLDB Endowment
A memetic algorithm for extending wireless sensor network lifetime

Information Sciences: an International Journal
Bit-vector algorithms for binary constraint satisfaction and subgraph isomorphism

Journal of Experimental Algorithmics (JEA)
Incident mining using structural prototypes

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
A message passing graph match algorithm based on a generative graphical model

AMT'12 Proceedings of the 8th international conference on Active Media Technology
Mining frequent correlated graphs with a new measure

Expert Systems with Applications: An International Journal
Querying business process model repositories

World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similarity search of complex structures is an important operation in graph-related applications since exact matching is often too restrictive. In this article, we investigate the issues of substructure similarity search using indexed features in graph databases. By transforming the edge relaxation ratio of a query graph into the maximum allowed feature misses, our structural filtering algorithm can filter graphs without performing pairwise similarity computation. It is further shown that using either too few or too many features can result in poor filtering performance. Thus the challenge is to design an effective feature set selection strategy that could maximize the filtering capability. We prove that the complexity of optimal feature set selection is Ω(2m) in the worst case, where m is the number of features for selection. In practice, we identify several criteria to build effective feature sets for filtering, and demonstrate that combining features with similar size and selectivity can improve the filtering and search performance significantly within a multifilter composition framework. The proposed feature-based filtering concept can be generalized and applied to searching approximate nonconsecutive sequences, trees, and other structured data as well.