A new cluster algorithm for graphs

Authors:
Stijn Dongen
Affiliations:
-
Venue:
A new cluster algorithm for graphs
Year:
1998

Citing 0
Cited 4

A Distributed Approach to Node Clustering in Decentralized Peer-to-Peer Networks

IEEE Transactions on Parallel and Distributed Systems
Clustering social networks

WAW'07 Proceedings of the 5th international conference on Algorithms and models for the web-graph
SDC: a distributed clustering protocol for peer-to-peer networks

NETWORKING'06 Proceedings of the 5th international IFIP-TC6 conference on Networking Technologies, Services, and Protocols; Performance of Computer and Communication Networks; Mobile and Wireless Communications Systems
Spreading activation model for connectivity based clustering

ADVIS'06 Proceedings of the 4th international conference on Advances in Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A new cluster algorithm for graphs called the \emph{Markov Cluster algorithm} ($MCL$ algorithm) is introduced. The graphs may be both weighted (with nonnegative weight) and directed. Let~$G$~be such a graph. The $MCL$ algorithm simulates flow in $G$ by first identifying $G$ in a canonical way with a Markov graph $G_1$. Flow is then alternatingly expanded and contracted, leading to a row of Markov Graphs $G_{(i)}$. The expansion step is done by computing higher step transition probabilities ($TP$''s), the contraction step creates a new Markov graph by favouring high $TP$''s and demoting low $TP$''s in a specific way. The heuristic underlying this approach is the expectation that flow between dense regions which are sparsely connected will evaporate. The stable limits of the process are easily derived and in practice the algorithm converges very fast to such a limit, the structure of which has a generic interpretation as an overlapping clustering of the graph~$G$. Overlap is limited to cases where the input graph has a symmetric structure inducing it. The contraction and expansion parameters of the algorithm influence the granularity of the output. The algorithm is space and time efficient with a space$+$quality/time trade--off, works very well for a wide range of test cases, and lends itself to drastic scaling. Experiments with a scaled $C$--implementation have been conducted on graphs having several tens of thousands of nodes. This report describes the algorithm, its complexity, and experimental results. The algorithm is introduced by first considering a generalization of generic single link clustering for graphs called $k$--path clustering.