Markov clustering of protein interaction networks with improved balance and scalability

Authors:
Venu Satuluri;Srinivasan Parthasarathy;Duygu Ucar
Affiliations:
The Ohio State University;The Ohio State University;University of Iowa
Venue:
Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Year:
2010

Citing 6
Cited 3

A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Dynamic simulation of protein complex formation on a genomic scale

Bioinformatics
Weighted Graph Cuts without Eigenvectors A Multilevel Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical properties of community structure in large social and information networks

Proceedings of the 17th international conference on World Wide Web
Predicting functionality of protein–DNA interactions by integrating diverse evidence

Bioinformatics
Scalable graph clustering using stochastic flows: applications to community discovery

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Protein complex prediction via bottleneck-based graph partitioning

Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics
Spectral graph multisection through orthogonality

Proceedings of the 4th MultiClust Workshop on Multiple Clusterings, Multi-view Data, and Multi-source Knowledge-driven Clustering
Identifying protein complexes in AP-MS data with negative evidence via soft Markov clustering

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics

Quantified Score

Hi-index	0.01

Visualization

Abstract

Markov Clustering (MCL) is a popular algorithm for clustering networks in bioinformatics such as protein-protein interaction networks and protein similarity networks. An important requirement when clustering protein networks is minimizing the number of big clusters, since it is generally understood that protein complexes tend not to have more than 15--30 nodes. Similarly, it is important to not output too many singleton clusters, since they do not provide much useful information. In this paper, we show how MCL may be modified so as to better respect these two requirements, while also taking the link structure in the graph into account. We design our algorithm on top of Regularized MCL (R-MCL) [16], a previously proposed modification of MCL. Our proposed variation computes a new regularization matrix at each iteration that penalizes big cluster sizes, with the size of the penalty being tunable using a balance parameter. This algorithm also naturally fits in a Multi level framework that allows great improvements in speed. We perform experiments on three real protein interaction networks and show significant improvements over MCL in quality, balance and execution speed.