Efficient subgraph frequency estimation with g-tries

Authors:
Pedro Ribeiro;Fernando Silva
Affiliations:
CRACS & INESC-Porto LA, Faculdade de Ciências, Universidade do Porto, Portugal;CRACS & INESC-Porto LA, Faculdade de Ciências, Universidade do Porto, Portugal
Venue:
WABI'10 Proceedings of the 10th international conference on Algorithms in bioinformatics
Year:
2010

Citing 8
Cited 3

Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs

Bioinformatics
Conserved network motifs allow protein--protein interaction prediction

Bioinformatics
Efficient Detection of Network Motifs

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Local Topology of Social Network Based on Motif Analysis

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part II
Strategies for Network Motifs Discovery

E-SCIENCE '09 Proceedings of the 2009 Fifth IEEE International Conference on e-Science
Network motif discovery using subgraph enumeration and symmetry-breaking

RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
g-tries: an efficient data structure for discovering network motifs

Proceedings of the 2010 ACM Symposium on Applied Computing

Querying subgraph sets with g-tries

DBSocial '12 Proceedings of the 2nd ACM SIGMOD Workshop on Databases and Social Networks
Towards a faster network-centric subgraph census

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
G-Tries: a data structure for storing and finding subgraphs

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many biological networks contain recurring overrepresented elements, called network motifs. Finding these substructures is a computationally hard task related to graph isomorphism. G-Tries are an efficient data structure, based on multiway trees, capable of efficiently identifying common substructures in a set of subgraphs. They are highly successful in constraining the search space when finding the occurrences of those subgraphs in a larger original graph. This leads to speedups up to 100 times faster than previous methods that aim for exact and complete results. In this paper we present a new efficient sampling algorithm for subgraph frequency estimation based on g-tries. It is able to uniformly traverse a fraction of the search space, providing an accurate unbiased estimation of subgraph frequencies. Our results show that in the same amount of time our algorithm achieves better precision than previous methods, as it is able to sustain higher sampling speeds.