G-Tries: a data structure for storing and finding subgraphs

Authors:
Pedro Ribeiro;Fernando Silva
Affiliations:
CRACS & INESC-TEC, Faculdade de Ciencias, Universidade do Porto, Porto, Portugal 4169-007;CRACS & INESC-TEC, Faculdade de Ciencias, Universidade do Porto, Porto, Portugal 4169-007
Venue:
Data Mining and Knowledge Discovery
Year:
2014

Citing 27
Cited 0

The graph isomorphism problem: its structural complexity

The graph isomorphism problem: its structural complexity
Isomorph-free exhaustive generation

Journal of Algorithms
Trie memory

Communications of the ACM
Discovering Frequent Closed Itemsets for Association Rules

ICDT '99 Proceedings of the 7th International Conference on Database Theory
The complexity of theorem-proving procedures

STOC '71 Proceedings of the third annual ACM symposium on Theory of computing
Mining Molecular Fragments: Finding Relevant Substructures of Molecules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Parallel algorithms for mining frequent structural motifs in scientific data

Proceedings of the 18th annual international conference on Supercomputing
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs

Bioinformatics
Conserved network motifs allow protein--protein interaction prediction

Bioinformatics
The political blogosphere and the 2004 U.S. election: divided they blog

Proceedings of the 3rd international workshop on Link discovery
NeMoFinder: dissecting genome-wide protein-protein interactions with meso-scale network motifs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Detection of Network Motifs

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Motif Search in Graphs: Application to Metabolic Networks

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Biological network comparison using graphlet degree distribution

Bioinformatics
Valgrind: a framework for heavyweight dynamic binary instrumentation

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Depth-first search and linear grajh algorithms

SWAT '71 Proceedings of the 12th Annual Symposium on Switching and Automata Theory (swat 1971)
Strategies for Network Motifs Discovery

E-SCIENCE '09 Proceedings of the 2009 Fifth IEEE International Conference on e-Science
Network motif discovery using subgraph enumeration and symmetry-breaking

RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
g-tries: an efficient data structure for discovering network motifs

Proceedings of the 2010 ACM Symposium on Applied Computing
Efficient subgraph frequency estimation with g-tries

WABI'10 Proceedings of the 10th international conference on Algorithms in bioinformatics
Efficient Parallel Subgraph Counting Using G-Tries

CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
Parallel discovery of network motifs

Journal of Parallel and Distributed Computing
A faster algorithm for detecting network motifs

WABI'05 Proceedings of the 5th International conference on Algorithms in Bioinformatics
Querying subgraph sets with g-tries

DBSocial '12 Proceedings of the 2nd ACM SIGMOD Workshop on Databases and Social Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ability to find and count subgraphs of a given network is an important non trivial task with multidisciplinary applicability. Discovering network motifs or computing graphlet signatures are two examples of methodologies that at their core rely precisely on the subgraph counting problem. Here we present the g-trie, a data-structure specifically designed for discovering subgraph frequencies. We produce a tree that encapsulates the structure of the entire graph set, taking advantage of common topologies in the same way a prefix tree takes advantage of common prefixes. This avoids redundancy in the representation of the graphs, thus allowing for both memory and computation time savings. We introduce a specialized canonical labeling designed to highlight common substructures and annotate the g-trie with a set of conditional rules that break symmetries, avoiding repetitions in the computation. We introduce a novel algorithm that takes as input a set of small graphs and is able to efficiently find and count them as induced subgraphs of a larger network. We perform an extensive empirical evaluation of our algorithms, focusing on efficiency and scalability on a set of diversified complex networks. Results show that g-tries are able to clearly outperform previously existing algorithms by at least one order of magnitude.