Fast approximate spectral clustering

Authors:
Donghui Yan;Ling Huang;Michael I. Jordan
Affiliations:
University of California, Berkeley, Berkeley, CA, USA;Intel, Berkeley, CA, USA;University of California, Berkeley, Berkeley, CA, USA
Venue:
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2009

Citing 28
Cited 21

A multilevel algorithm for partitioning graphs

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Journal of the ACM (JACM)
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Data clustering: a review

ACM Computing Surveys (CSUR)
An empirical comparison of four initialization methods for the K-Means algorithm

Pattern Recognition Letters
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Clustering Algorithms

Clustering Algorithms
Density-Based Multiscale Data Condensation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Random Forests

Machine Learning
Likelihood-Based Data Squashing: A Modeling Approach to Instance Construction

Data Mining and Knowledge Discovery
An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Sparse Greedy Matrix Approximation for Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Efficient svm training using low-rank kernel representations

The Journal of Machine Learning Research
Multiclass Spectral Clustering

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Spectral Grouping Using the Nyström Method

IEEE Transactions on Pattern Analysis and Machine Intelligence
On clusterings: Good, bad and spectral

Journal of the ACM (JACM)
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
A method for initialising the K-means clustering algorithm using kd-trees

Pattern Recognition Letters
Learning Spectral Clustering, With Application To Speech Separation

The Journal of Machine Learning Research
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Weighted Graph Cuts without Eigenvectors A Multilevel Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast Iterative Kernel Principal Component Analysis

The Journal of Machine Learning Research
Hierarchical initialization approach for K-Means clustering

Pattern Recognition Letters
Random projection trees and low dimensional manifolds

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Approximating a gram matrix for improved kernel-based learning

COLT'05 Proceedings of the 18th annual conference on Learning Theory
Quantization

IEEE Transactions on Information Theory

Collaborative filtering based on an iterative prediction method to alleviate the sparsity problem

Proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services
Accelerating spectral clustering with partial supervision

Data Mining and Knowledge Discovery
A Very Fast Method for Clustering Big Text Datasets

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Fast density-weighted low-rank approximation spectral clustering

Data Mining and Knowledge Discovery
Fast affinity propagation clustering: A multilevel approach

Pattern Recognition
Eigenvector sensitive feature selection for spectral clustering

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Separation theorem for independent subspace analysis and its consequences

Pattern Recognition
Vector quantization based approximate spectral clustering of large datasets

Pattern Recognition
Terrorist organization behavior prediction algorithm based on context subspace

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Multi-agent adaptive boosting on semi-supervised water supply clusters

Advances in Engineering Software
A model for mining relevant and non-redundant information

Proceedings of the 27th Annual ACM Symposium on Applied Computing
GANC: Greedy agglomerative normalized cut for graph clustering

Pattern Recognition
Finding representative nodes in probabilistic graphs

Bisociative Knowledge Discovery
p-PIC: Parallel power iteration clustering for big data

Journal of Parallel and Distributed Computing
Clustering under approximation stability

Journal of the ACM (JACM)
From biological to social networks: Link prediction based on multi-way spectral clustering

Data & Knowledge Engineering
Large-scale spectral clustering on graphs

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Multi-view K-means clustering on big data

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Deflation-based power iteration clustering

Applied Intelligence
Local information-based fast approximate spectral clustering

Pattern Recognition Letters
Efficient eigen-updating for spectral graph clustering

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spectral clustering refers to a flexible class of clustering procedures that can produce high-quality clusterings on small data sets but which has limited applicability to large-scale problems due to its computational complexity of O(n3) in general, with n the number of data points. We extend the range of spectral clustering by developing a general framework for fast approximate spectral clustering in which a distortion-minimizing local transformation is first applied to the data. This framework is based on a theoretical analysis that provides a statistical characterization of the effect of local distortion on the mis-clustering rate. We develop two concrete instances of our general framework, one based on local k-means clustering (KASP) and one based on random projection trees (RASP). Extensive experiments show that these algorithms can achieve significant speedups with little degradation in clustering accuracy. Specifically, our algorithms outperform k-means by a large margin in terms of accuracy, and run several times faster than approximate spectral clustering based on the Nystrom method, with comparable accuracy and significantly smaller memory footprint. Remarkably, our algorithms make it possible for a single machine to spectral cluster data sets with a million observations within several minutes.