GADDI: distance index based subgraph matching in biological networks

Authors:
Shijie Zhang;Shirong Li;Jiong Yang
Affiliations:
Case Western Reserve University, Cleveland, OH;Case Western Reserve University, Cleveland, OH;Case Western Reserve University, Cleveland, OH
Venue:
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Year:
2009

Citing 37
Cited 30

CLIP: concept learning from inference patterns

Artificial Intelligence - Special issue: AI research in Japan
Knowledge discovery from structural data

Journal of Intelligent Information Systems
An Algorithm for Subgraph Isomorphism

Journal of the ACM (JACM)
A framework for constructing features and models for intrusion detection systems

ACM Transactions on Information and System Security (TISSEC)
Molecular feature mining in HIV data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Algorithmics and applications of tree and graph searching

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Modern Information Retrieval

Modern Information Retrieval
Towards Semantic Web Mining

ISWC '02 Proceedings of the First International Semantic Web Conference on The Semantic Web
SEuS: Structure Extraction Using Summaries

DS '02 Proceedings of the 5th International Conference on Discovery Science
ANF: a fast and scalable tool for data mining in massive graphs

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Computing Frequent Graph Patterns from Semistructured Data

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Logic Induction of Valid Behavior Specifications for Intrusion Detection

SP '00 Proceedings of the 2000 IEEE Symposium on Security and Privacy
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
SPIN: mining maximal frequent subgraphs from graph databases

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A quickstart in frequent structure mining can make a difference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A (Sub)Graph Isomorphism Algorithm for Matching Large Graphs

IEEE Transactions on Pattern Analysis and Machine Intelligence
Substructure similarity search in graph databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Discovering large dense subgraphs in massive graphs

VLDB '05 Proceedings of the 31st international conference on Very large data bases
An efficient algorithm for detecting frequent subgraphs in biological networks

Bioinformatics
Alignment of metabolic pathways

Bioinformatics
Finding Frequent Patterns in a Large Sparse Graph*

Data Mining and Knowledge Discovery
Closure-Tree: An Index Structure for Graph Queries

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Searching Substructures with Superimposed Distance

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Fg-index: towards verification-free query processing on graph databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Correlation search in graph databases

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Graph indexing: tree + delta

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Subgraph Support in a Single Large Graph

ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
A novel spectral coding in a large graph database

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Graphs-at-a-time: query language and access methods for graph databases

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
TALE: A Tool for Approximate Large Graph Matching

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
QNet: a tool for querying protein interaction networks

RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
Pairwise global alignment of protein interaction networks by matching neighborhood topology

RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
What is frequent in a single graph?

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Pairwise local alignment of protein interaction networks guided by models of evolution

RECOMB'05 Proceedings of the 9th Annual international conference on Research in Computational Molecular Biology

SUMMA: subgraph matching in massive graphs

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
DESSIN: mining dense subgraph patterns in a single graph

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
DSI: a method for indexing large graphs using distance set

WAIM'10 Proceedings of the 11th international conference on Web-age information management
On graph query optimization in large networks

Proceedings of the VLDB Endowment
SAPPER: subgraph indexing and approximate matching in large graphs

Proceedings of the VLDB Endowment
Bit-vector algorithms for binary constraint satisfaction and subgraph isomorphism

Journal of Experimental Algorithmics (JEA)
Structure and attribute index for approximate graph matching in large graphs

Information Systems
An edge-based framework for fast subgraph matching in a large graph

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
A flexible graph pattern matching framework via indexing

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Subgraph search over massive disk resident graphs

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
BR-index: an indexing structure for subgraph matching in very large dynamic graphs

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
A path-oriented RDF index for keyword search query processing

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Answering subgraph queries over large graphs

WAIM'11 Proceedings of the 12th international conference on Web-age information management
DELTA: indexing and querying multi-labeled graphs

Proceedings of the 20th ACM international conference on Information and knowledge management
CP-index: on the efficient indexing of large graphs

Proceedings of the 20th ACM international conference on Information and knowledge management
NOVA: a novel and efficient framework for finding subgraph isomorphism mappings in large graphs

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
TreeSpan: efficiently computing similarity all-matching

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Approximate matching over biological RDF graphs

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Efficient subgraph similarity all-matching

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Efficient subgraph matching on billion node graphs

Proceedings of the VLDB Endowment
An in-depth comparison of subgraph isomorphism algorithms in graph databases

Proceedings of the VLDB Endowment
A query based approach for mining evolving graphs

AusDM '09 Proceedings of the Eighth Australasian Data Mining Conference - Volume 101
STUN: Spatio-Temporal Uncertain (Social) Networks

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
A similarity measure for approximate querying over RDF data

Proceedings of the Joint EDBT/ICDT 2013 Workshops
Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Using substructure mining to identify misbehavior in network provenance graphs

First International Workshop on Graph Data Management Experiences and Systems
Efficient simrank-based similarity join over large graphs

Proceedings of the VLDB Endowment
Efficient Multiview Maintenance under Insertion in Huge Social Networks

ACM Transactions on the Web (TWEB)
SQBC: An efficient subgraph matching method over large and dense graphs

Information Sciences: an International Journal
Hybrid query execution engine for large attributed graphs

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Currently, a huge amount of biological data can be naturally represented by graphs, e.g., protein interaction networks, gene regulatory networks, etc. The need for indexing large graphs is an urgent research problem of great practical importance. The main challenge is size. Each graph may contain thousands (or more) vertices. Most of the previous work focuses on indexing a set of small or medium sized database graphs (with only tens of vertices) and finding whether a query graph occurs in any of these. In this paper, we are interested in finding all the matches of a query graph in a given large graph of thousands of vertices, which is a very important task in many biological applications. This increases the complexity significantly. We propose a novel distance measurement which reintroduces the idea of frequent substructures in a single large graph. We devise the novel structure distance based approach (GADDI) to efficiently find matches of the query graph. GADDI is further optimized by the use of a dynamic matching scheme to minimize redundant calculations. Last but not least, a number of real and synthetic data sets are used to evaluate the efficiency and scalability of our proposed method.