A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
Weighted Graph Cuts without Eigenvectors A Multilevel Approach
IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical properties of community structure in large social and information networks
Proceedings of the 17th international conference on World Wide Web
Scalable graph clustering using stochastic flows: applications to community discovery
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Protein complex prediction via bottleneck-based graph partitioning
Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics
Spectral graph multisection through orthogonality
Proceedings of the 4th MultiClust Workshop on Multiple Clusterings, Multi-view Data, and Multi-source Knowledge-driven Clustering
Identifying protein complexes in AP-MS data with negative evidence via soft Markov clustering
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Hi-index | 0.01 |
Markov Clustering (MCL) is a popular algorithm for clustering networks in bioinformatics such as protein-protein interaction networks and protein similarity networks. An important requirement when clustering protein networks is minimizing the number of big clusters, since it is generally understood that protein complexes tend not to have more than 15--30 nodes. Similarly, it is important to not output too many singleton clusters, since they do not provide much useful information. In this paper, we show how MCL may be modified so as to better respect these two requirements, while also taking the link structure in the graph into account. We design our algorithm on top of Regularized MCL (R-MCL) [16], a previously proposed modification of MCL. Our proposed variation computes a new regularization matrix at each iteration that penalizes big cluster sizes, with the size of the penalty being tunable using a balance parameter. This algorithm also naturally fits in a Multi level framework that allows great improvements in speed. We perform experiments on three real protein interaction networks and show significant improvements over MCL in quality, balance and execution speed.