A new approach to data driven clustering

Authors:
Arik Azran;Zoubin Ghahramani
Affiliations:
University College London, UK;University of Cambridge, Cambridge, UK
Venue:
ICML '06 Proceedings of the 23rd international conference on Machine learning
Year:
2006

Citing 4
Cited 3

Data clustering: a review

ACM Computing Surveys (CSUR)
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
On clusterings-good, bad and spectral

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing

Graph partitioning into isolated, high conductance clusters: theory, computation and applications to preconditioning

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Maximum margin clustering made practical

IEEE Transactions on Neural Networks
Hierarchical verb clustering using graph factorization

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

We consider the problem of clustering in its most basic form where only a local metric on the data space is given. No parametric statistical model is assumed, and the number of clusters is learned from the data. We introduce, analyze and demonstrate a novel approach to clustering where data points are viewed as nodes of a graph, and pairwise similarities are used to derive a transition probability matrix P for a Markov random walk between them. The algorithm automatically reveals structure at increasing scales by varying the number of steps taken by this random walk. Points are represented as rows of Pt, which are the t-step distributions of the walk starting at that point; these distributions are then clustered using a KL-minimizing iterative algorithm. Both the number of clusters, and the number of steps that 'best reveal' it, are found by optimizing spectral properties of P.