Linear time algorithms for clustering problems in any dimensions

Authors:
Amit Kumar;Yogish Sabharwal;Sandeep Sen
Affiliations:
Dept of Comp Sc & Engg, Indian Institute of Technology, New Delhi, India;IBM India Research Lab, Block-I, IIT Delhi, Hauz Khas, New Delhi, India;Dept of Comp Sc & Engg, Indian Institute of Technology, Kharagpur, India
Venue:
ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Year:
2005

Citing 14
Cited 10

Color indexing

International Journal of Computer Vision
Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract)

SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Efficient and effective querying by image content

Journal of Intelligent Information Systems - Special issue: advances in visual information management systems
Approximation algorithms for geometric problems

Approximation algorithms for NP-hard problems
Approximation schemes for Euclidean k-medians and related problems

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
A Nearly Linear-Time Approximation Scheme for the Euclidean kappa-median Problem

ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Approximation schemes for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Polynomial time approximation schemes for Euclidean TSP and other geometric problems

FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science
High-dimensional computational geometry

High-dimensional computational geometry
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
On coresets for k-means and k-median clustering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science

On k-Median clustering in high dimensions

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
A PTAS for k-means clustering based on weak coresets

SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
Clustering for metric and non-metric distance measures

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Small space representations for metric min-sum k-clustering and their applications

STACS'07 Proceedings of the 24th annual conference on Theoretical aspects of computer science
The priority k-median problem

FSTTCS'07 Proceedings of the 27th international conference on Foundations of software technology and theoretical computer science
Clustering for metric and nonmetric distance measures

ACM Transactions on Algorithms (TALG)
Sublinear-time algorithms

Property testing
Sublinear-time algorithms

Property testing
Bregman clustering for separable instances

SWAT'10 Proceedings of the 12th Scandinavian conference on Algorithm Theory
Active clustering of biological sequences

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We generalize the k-means algorithm presented by the authors [14] and show that the resulting algorithm can solve a larger class of clustering problems that satisfy certain properties (existence of a random sampling procedure and tightness). We prove these properties for the k-median and the discrete k-means clustering problems, resulting in O(2(k/ε)O(1)dn) time (1+ε)-approximation algorithms for these problems. These are the first algorithms for these problems linear in the size of the input (nd for n points in d dimensions), independent of dimensions in the exponent, assuming k and ε to be fixed. A key ingredient of the k-median result is a (1+ε)-approximation algorithm for the 1-median problem which has running time O(2(1/ε)O(1)d). The previous best known algorithm for this problem had linear running time.