Algorithms for clustering data
Algorithms for clustering data
Vector quantization and signal compression
Vector quantization and signal compression
The symmetric eigenvalue problem
The symmetric eigenvalue problem
Property testing and its connection to learning and approximation
Journal of the ACM (JACM)
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
The analysis of a simple k-means clustering algorithm
Proceedings of the sixteenth annual symposium on Computational geometry
Latent semantic indexing: a probabilistic analysis
Journal of Computer and System Sciences - Special issue on the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems
A sharp threshold in proof complexity
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Polynomial-time approximation schemes for geometric min-sum median clustering
Journal of the ACM (JACM)
A constant-factor approximation algorithm for the k-median problem
Journal of Computer and System Sciences - STOC 1999
Clustering Categorical Data: An Approach Based on Dynamical Systems
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Sampling lower bounds via information theory
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Fast Monte-Carlo Algorithms for finding low-rank approximations
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Improved Combinatorial Algorithms for the Facility Location and k-Median Problems
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Primal-Dual Approximation Algorithms for Metric Facility Location and k-Median Problems
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Fast Monte-Carlo Algorithms for Approximate Matrix Multiplication
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
The complexity of massive data set computations
The complexity of massive data set computations
Fast monte-carlo algorithms for finding low-rank approximations
Journal of the ACM (JACM)
Subproblem optimization by gene correlation with singular value decomposition
GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
Spectral techniques for graph bisection in genetic algorithms
Proceedings of the 8th annual conference on Genetic and evolutionary computation
Online clustering of parallel data streams
Data & Knowledge Engineering
Latent linkage semantic kernels for collective classification of link data
Journal of Intelligent Information Systems
Exploiting asymmetry in hierarchical topic extraction
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Fast computation of low-rank matrix approximations
Journal of the ACM (JACM)
Sampling from large matrices: An approach through geometric functional analysis
Journal of the ACM (JACM)
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Spectral clustering in telephone call graphs
Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
A continuous facility location problem and its application to a clustering problem
Proceedings of the 2008 ACM symposium on Applied computing
Approximation algorithms for co-clustering
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
An approximation ratio for biclustering
Information Processing Letters
PCA and SVD with nonnegative loadings
Pattern Recognition
The Planar k-Means Problem is NP-Hard
WALCOM '09 Proceedings of the 3rd International Workshop on Algorithms and Computation
Graph nodes clustering with the sigmoid commute-time kernel: A comparative study
Data & Knowledge Engineering
A global optimization method for semi-supervised clustering
Data Mining and Knowledge Discovery
NP-hardness of Euclidean sum-of-squares clustering
Machine Learning
Latent space domain transfer between high dimensional overlapping distributions
Proceedings of the 18th international conference on World wide web
Spectral Clustering in Social Networks
Advances in Web Mining and Web Usage Analysis
Privacy-Preserving Clustering with High Accuracy and Low Time Complexity
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
A spectral-based clustering algorithm for categorical data using data summaries
Proceedings of the 2nd Workshop on Data Mining using Matrices and Tensors
Foundations and Trends® in Theoretical Computer Science
Random projection trees for vector quantization
IEEE Transactions on Information Theory
Singular value decomposition in additive, multiplicative, and logistic forms
Pattern Recognition
Data clustering: 50 years beyond K-means
Pattern Recognition Letters
GPC'07 Proceedings of the 2nd international conference on Advances in grid and pervasive computing
Graph nodes clustering based on the commute-time kernel
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Eigenvector-based clustering using aggregated similarity matrices
Proceedings of the 2010 ACM Symposium on Applied Computing
On the isoperimetric spectrum of graphs and its approximations
Journal of Combinatorial Theory Series B
A robust iterative refinement clustering algorithm with smoothing search space
Knowledge-Based Systems
Spectral methods for matrices and tensors
Proceedings of the forty-second ACM symposium on Theory of computing
Traffic-based network clustering
Proceedings of the 6th International Wireless Communications and Mobile Computing Conference
Flexible constrained spectral clustering
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Algorithms and theory of computation handbook
Stochastic algorithms in linear algebra: beyond the Markov chains and von Neumann-Ulam scheme
NMA'10 Proceedings of the 7th international conference on Numerical methods and applications
The complexity status of problems related to sparsest cuts
IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
Independent Component Analysis Based Seeding Method for K-Means Clustering
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Data reduction for weighted and outlier-resistant clustering
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Personalized news categorization through scalable text classification
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
On the complexity of several haplotyping problems
WABI'05 Proceedings of the 5th International conference on Algorithms in Bioinformatics
A fast random sampling algorithm for sparsifying matrices
APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Randomized Algorithms for Matrices and Data
Foundations and Trends® in Machine Learning
The complexity of finding uniform sparsest cuts in various graph classes
Journal of Discrete Algorithms
The planar k-means problem is NP-hard
Theoretical Computer Science
Collaborative similarity measure for intra graph clustering
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications
Drawing Large Graphs by Low-Rank Stress Majorization
Computer Graphics Forum
The singular values and vectors of low rank perturbations of large rectangular random matrices
Journal of Multivariate Analysis
The effectiveness of lloyd-type methods for the k-means problem
Journal of the ACM (JACM)
Clustering genome data based on approximate matching
International Journal of Data Analysis Techniques and Strategies
Low rank approximation and regression in input sparsity time
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Fuzzy and hard clustering analysis for thyroid disease
Computer Methods and Programs in Biomedicine
Anomaly detection in large-scale data stream networks
Data Mining and Knowledge Discovery
On constrained spectral clustering and its applications
Data Mining and Knowledge Discovery
Matrix Recipes for Hard Thresholding Methods
Journal of Mathematical Imaging and Vision
Hi-index | 0.06 |
We consider the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters (usually m and n are variable, while k is fixed), so as to minimize the sum of squared distances between each point and its cluster center. This formulation is usually the objective of the k-means clustering algorithm (Kanungo et al. (2000)). We prove that this problem in NP-hard even for k = 2, and we consider a continuous relaxation of this discrete problem: find the k-dimensional subspace V that minimizes the sum of squared distances to V of the m points. This relaxation can be solved by computing the Singular Value Decomposition (SVD) of the m × n matrix A that represents the m points; this solution can be used to get a 2-approximation algorithm for the original problem. We then argue that in fact the relaxation provides a generalized clustering which is useful in its own right.Finally, we show that the SVD of a random submatrix—chosen according to a suitable probability distribution—of a given matrix provides an approximation to the SVD of the whole matrix, thus yielding a very fast randomized algorithm. We expect this algorithm to be the main contribution of this paper, since it can be applied to problems of very large size which typically arise in modern applications.