Graph based k-means clustering

Authors:
Laurent Galluccio;Olivier Michel;Pierre Comon;Alfred O. Hero, III
Affiliations:
I3S, UMR6070 CNRS, University of Nice-Sophia Antipolis, 2000 route des Lucioles, 06903 Sophia Antipolis Cedex, France and Laboratoire Cassiopée UMR 6202, University of Nice Sophia Antipolis, ...;Gipsa-Lab UMR 5216, 961 rue de la Houille Blanche, BP 46, 38402 Saint Martin d'Heres Cedex, France;I3S, UMR6070 CNRS, University of Nice-Sophia Antipolis, 2000 route des Lucioles, 06903 Sophia Antipolis Cedex, France;Department of Electrical Engineering and Computer Science, University of Michigan, 1301 Beal Avenue, Ann Arbor, MI 48109-2122, USA
Venue:
Signal Processing
Year:
2012

Citing 28
Cited 0

The hB-tree: a multiattribute indexing method with good guaranteed performance

ACM Transactions on Database Systems (TODS)
Vector quantization and signal compression

Vector quantization and signal compression
Elements of information theory

Elements of information theory
Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Data clustering: a review

ACM Computing Surveys (CSUR)
An empirical comparison of four initialization methods for the K-Means algorithm

Pattern Recognition Letters
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
Clustering Algorithms

Clustering Algorithms
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Laplacian Eigenmaps for dimensionality reduction and data representation

Neural Computation
A divisive information theoretic feature clustering algorithm for text classification

The Journal of Machine Learning Research
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Clustering with Bregman Divergences

The Journal of Machine Learning Research
Pattern Recognition, Third Edition

Pattern Recognition, Third Edition
A method for initialising the K-means clustering algorithm using kd-trees

Pattern Recognition Letters
Learning Spectral Clustering, With Application To Speech Separation

The Journal of Machine Learning Research
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Fast Algorithms for Constructing Minimal Spanning Trees in Coordinate Spaces

IEEE Transactions on Computers
On the decomposition of Mars hyperspectral data by ICA and Bayesian positive source separation

Neurocomputing
Numerical Recipes 3rd Edition: The Art of Scientific Computing

Numerical Recipes 3rd Edition: The Art of Scientific Computing
On the History of the Minimum Spanning Tree Problem

IEEE Annals of the History of Computing
A Cluster Separation Measure

IEEE Transactions on Pattern Analysis and Machine Intelligence
Tree-structured nonlinear signal modeling and prediction

IEEE Transactions on Signal Processing
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.08

Visualization

Abstract

An original approach to cluster multi-component data sets is proposed that includes an estimation of the number of clusters. Using Prim's algorithm to construct a minimal spanning tree (MST) we show that, under the assumption that the vertices are approximately distributed according to a spatial homogeneous Poisson process, the number of clusters can be accurately estimated by thresholding the sequence of edge lengths added to the MST by Prim's algorithm. This sequence, called the Prim trajectory, contains sufficient information to determine both the number of clusters and the approximate locations of the cluster centroids. The estimated number of clusters and cluster centroids are used to initialize the generalized Lloyd algorithm, also known as k-means, which circumvents its well known initialization problems. We evaluate the false positive rate of our cluster detection algorithm, using Poisson approximations in Euclidean spaces. Applications of this method in the multi/hyper-spectral imagery domain to a satellite view of Paris and to an image of Mars are also presented.