A multilevel algorithm for partitioning graphs
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
An optimal algorithm for approximate nearest neighbor searching fixed dimensions
Journal of the ACM (JACM)
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
ACM Computing Surveys (CSUR)
An empirical comparison of four initialization methods for the K-Means algorithm
Pattern Recognition Letters
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate clustering via core-sets
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Clustering Algorithms
Density-Based Multiscale Data Condensation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine Learning
Likelihood-Based Data Squashing: A Modeling Approach to Instance Construction
Data Mining and Knowledge Discovery
An Efficient k-Means Clustering Algorithm: Analysis and Implementation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Refining Initial Points for K-Means Clustering
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Sparse Greedy Matrix Approximation for Machine Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Efficient svm training using low-rank kernel representations
The Journal of Machine Learning Research
Multiclass Spectral Clustering
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Spectral Grouping Using the Nyström Method
IEEE Transactions on Pattern Analysis and Machine Intelligence
On clusterings: Good, bad and spectral
Journal of the ACM (JACM)
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions
FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
A method for initialising the K-means clustering algorithm using kd-trees
Pattern Recognition Letters
Learning Spectral Clustering, With Application To Speech Separation
The Journal of Machine Learning Research
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Weighted Graph Cuts without Eigenvectors A Multilevel Approach
IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast Iterative Kernel Principal Component Analysis
The Journal of Machine Learning Research
Hierarchical initialization approach for K-Means clustering
Pattern Recognition Letters
Random projection trees and low dimensional manifolds
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Approximating a gram matrix for improved kernel-based learning
COLT'05 Proceedings of the 18th annual conference on Learning Theory
IEEE Transactions on Information Theory
Collaborative filtering based on an iterative prediction method to alleviate the sparsity problem
Proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services
Accelerating spectral clustering with partial supervision
Data Mining and Knowledge Discovery
A Very Fast Method for Clustering Big Text Datasets
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Fast density-weighted low-rank approximation spectral clustering
Data Mining and Knowledge Discovery
Fast affinity propagation clustering: A multilevel approach
Pattern Recognition
Eigenvector sensitive feature selection for spectral clustering
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Separation theorem for independent subspace analysis and its consequences
Pattern Recognition
Vector quantization based approximate spectral clustering of large datasets
Pattern Recognition
Terrorist organization behavior prediction algorithm based on context subspace
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Multi-agent adaptive boosting on semi-supervised water supply clusters
Advances in Engineering Software
A model for mining relevant and non-redundant information
Proceedings of the 27th Annual ACM Symposium on Applied Computing
GANC: Greedy agglomerative normalized cut for graph clustering
Pattern Recognition
Finding representative nodes in probabilistic graphs
Bisociative Knowledge Discovery
p-PIC: Parallel power iteration clustering for big data
Journal of Parallel and Distributed Computing
Clustering under approximation stability
Journal of the ACM (JACM)
From biological to social networks: Link prediction based on multi-way spectral clustering
Data & Knowledge Engineering
Large-scale spectral clustering on graphs
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Multi-view K-means clustering on big data
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Deflation-based power iteration clustering
Applied Intelligence
Local information-based fast approximate spectral clustering
Pattern Recognition Letters
Efficient eigen-updating for spectral graph clustering
Neurocomputing
Hi-index | 0.00 |
Spectral clustering refers to a flexible class of clustering procedures that can produce high-quality clusterings on small data sets but which has limited applicability to large-scale problems due to its computational complexity of O(n3) in general, with n the number of data points. We extend the range of spectral clustering by developing a general framework for fast approximate spectral clustering in which a distortion-minimizing local transformation is first applied to the data. This framework is based on a theoretical analysis that provides a statistical characterization of the effect of local distortion on the mis-clustering rate. We develop two concrete instances of our general framework, one based on local k-means clustering (KASP) and one based on random projection trees (RASP). Extensive experiments show that these algorithms can achieve significant speedups with little degradation in clustering accuracy. Specifically, our algorithms outperform k-means by a large margin in terms of accuracy, and run several times faster than approximate spectral clustering based on the Nystrom method, with comparable accuracy and significantly smaller memory footprint. Remarkably, our algorithms make it possible for a single machine to spectral cluster data sets with a million observations within several minutes.