Biological network comparison using graphlet degree distribution

Authors:
Nataša Pržulj
Affiliations:
Computer Science Department, University of California Irvine, CA 92697-3425, USA
Venue:
Bioinformatics
Year:
2007

Citing 0
Cited 18

Approximating the Number of Network Motifs

WAW '09 Proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph
Hash Kernels for Structured Data

The Journal of Machine Learning Research
A benchmark diagnostic model generation system

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans - Special issue on model-based diagnostics
Extended dynamic subgraph statistics using h-index parameterized data structures

COCOA'10 Proceedings of the 4th international conference on Combinatorial optimization and applications - Volume Part I
Constructing social networks from unstructured group dialog in virtual worlds

SBP'11 Proceedings of the 4th international conference on Social computing, behavioral-cultural modeling and prediction
Ranking differential genes in co-expression networks

Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Vertex collocation profiles: subgraph counting for link analysis and prediction

Proceedings of the 21st international conference on World Wide Web
Querying subgraph sets with g-tries

DBSocial '12 Proceedings of the 2nd ACM SIGMOD Workshop on Databases and Social Networks
Tutorial on biological networks

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Extended dynamic subgraph statistics using h-index parameterized data structures

Theoretical Computer Science
GRAFT: an approximate graphlet counting algorithm for large graph analysis

Proceedings of the 21st ACM international conference on Information and knowledge management
Degree relations of triangles in real-world networks and graph models

Proceedings of the 21st ACM international conference on Information and knowledge management
Comparison of Co-authorship Networks across Scientific Fields Using Motifs

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Classifying Wikipedia articles using network motif counts and ratios

Proceedings of the Eighth Annual International Symposium on Wikis and Open Collaboration
Link prediction in human mobility networks

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Increasing the reliability of protein-protein interaction networks via non-convex semantic embedding

Neurocomputing
Characterizing the Topology of Probabilistic Biological Networks

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
G-Tries: a data structure for storing and finding subgraphs

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Analogous to biological sequence comparison, comparing cellular networks is an important problem that could provide insight into biological understanding and therapeutics. For technical reasons, comparing large networks is computationally infeasible, and thus heuristics, such as the degree distribution, clustering coefficient, diameter, and relative graphlet frequency distribution have been sought. It is easy to demonstrate that two networks are different by simply showing a short list of properties in which they differ. It is much harder to show that two networks are similar, as it requires demonstrating their similarity in all of their exponentially many properties. Clearly, it is computationally prohibitive to analyze all network properties, but the larger the number of constraints we impose in determining network similarity, the more likely it is that the networks will truly be similar. Results: We introduce a new systematic measure of a network's local structure that imposes a large number of similarity constraints on networks being compared. In particular, we generalize the degree distribution, which measures the number of nodes 'touching' k edges, into distributions measuring the number of nodes 'touching' k graphlets, where graphlets are small connected non-isomorphic subgraphs of a large network. Our new measure of network local structure consists of 73 graphlet degree distributions of graphlets with 2--5 nodes, but it is easily extendible to a greater number of constraints (i.e. graphlets), if necessary, and the extensions are limited only by the available CPU. Furthermore, we show a way to combine the 73 graphlet degree distributions into a network 'agreement' measure which is a number between 0 and 1, where 1 means that networks have identical distributions and 0 means that they are far apart. Based on this new network agreement measure, we show that almost all of the 14 eukaryotic PPI networks, including human, resulting from various high-throughput experimental techniques, as well as from curated databases, are better modeled by geometric random graphs than by Erdös--Rény, random scale-free, or Barabási--Albert scale-free networks. Availability: Software executables are available upon request. Contact: natasha@ics.uci.edu