Graph Clustering Via a Discrete Uncoupling Process

Authors:
Stijn Van Dongen
Affiliations:
svd@sanger.ac.uk
Venue:
SIAM Journal on Matrix Analysis and Applications
Year:
2008

Citing 0
Cited 22

A high availability clustering and load balacing mechanism for information security infrastructure system

Proceedings of the 2009 International Conference on Hybrid Information Technology
Use of ternary similarities in graph based clustering for protein structural family classification

Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Task 5: Single document keyphrase extraction using sentence clustering and latent Dirichlet allocation

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Identifying hotspots on the real-time web

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Transient crowd discovery on the real-time social web

Proceedings of the fourth ACM international conference on Web search and data mining
Is there a best quality metric for graph clusters?

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Automated segmentation of DNA sequences with complex evolutionary histories

WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics
The Combinatorial BLAS: design, implementation, and applications

International Journal of High Performance Computing Applications
Detecting communities in sparse MANETs

IEEE/ACM Transactions on Networking (TON)
CLARM: an integrative approach for functional modules discovery

Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
MadLINQ: large-scale distributed matrix computation for the cloud

Proceedings of the 7th ACM european conference on Computer Systems
Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Computing on GPU with CUDA and ELLPACK-R Sparse Format

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
On the separability of structural classes of communities

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Rel-grams: a probabilistic model of relations in text

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Robust community detection methods with resolution parameter for complex detection in protein protein interaction networks

PRIB'12 Proceedings of the 7th IAPR international conference on Pattern Recognition in Bioinformatics
Interconnection of asynchronous Boolean networks, asymptotic and transient dynamics

Automatica (Journal of IFAC)
Communication optimal parallel multiplication of sparse random matrices

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Towards realistic artificial benchmark for community detection algorithms evaluation

International Journal of Web Based Communities
Locating communities on graphs with variations in community sizes

The Journal of Supercomputing
A separability framework for analyzing community structure

ACM Transactions on Knowledge Discovery from Data (TKDD) - Casin special issue
Efficient community detection with additive constrains on large networks

Knowledge-Based Systems
Refactoring packages of object-oriented software using genetic algorithm based community detection technique

International Journal of Computer Applications in Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

A discrete uncoupling process for finite spaces is introduced, called the Markov Cluster Process or the MCL process. The process is the engine for the graph clustering algorithm called the MCL algorithm. The MCL process takes a stochastic matrix as input, and then alternates expansion and inflation, each step defining a stochastic matrix in terms of the previous one. Expansion corresponds with taking the $k$th power of a stochastic matrix, where $k\in\N$. Inflation corresponds with a parametrized operator $\Gamma_r$, $r\geq 0$, that maps the set of (column) stochastic matrices onto itself. The image $\Gamma_r M$ is obtained by raising each entry in $M$ to the $r$th power and rescaling each column to have sum 1 again. In practice the process converges very fast towards a limit that is invariant under both matrix multiplication and inflation, with quadratic convergence around the limit points. The heuristic behind the process is its expected behavior for (Markov) graphs possessing cluster structure. The process is typically applied to the matrix of random walks on a given graph $G$, and the connected components of (the graph associated with) the process limit generically allow a clustering interpretation of $G$. The limit is in general extremely sparse and iterands are sparse in a weighted sense, implying that the MCL algorithm is very fast and highly scalable. Several mathematical properties of the MCL process are established. Most notably, the process (and algorithm) iterands posses structural properties generalizing the mapping from process limits onto clusterings. The inflation operator $\Gamma_r$ maps the class of matrices that are diagonally similar to a symmetric matrix onto itself. The phrase diagonally positive semi-definite (dpsd) is used for matrices that are diagonally similar to a positive semi-definite matrix. For $r\in\N$ and for $M$ a stochastic dpsd matrix, the image $\Gamma_r M$ is again dpsd. Determinantal inequalities satisfied by a dpsd matrix $M$ imply a natural ordering among the diagonal elements of $M$, generalizing the mapping of process limits onto clusterings. The spectrum of $\Gamma_{\infty} M$ is of the form $\{0^{n-k}, 1^k\}$, where $k$ is the number of endclasses of the ordering associated with $M$, and $n$ is the dimension of $M$. This attests to the uncoupling effect of the inflation operator.