A PTAS for k-means clustering based on weak coresets

Authors:
Dan Feldman;Morteza Monemizadeh;Christian Sohler
Affiliations:
Tel Aviv University, Tel Aviv, Israel;University of Paderborn, Paderborn, Germany;Heinz Nixdorf Institute and Department of Computer Science, Paderborn, Germany
Venue:
SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
Year:
2007

Citing 13
Cited 20

Decision theoretic generalizations of the PAC model for neural net and other learning applications

Information and Computation
Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract)

SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Approximation schemes for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
On coresets for k-means and k-median clustering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Optimal Time Bounds for Approximate Clustering

Machine Learning
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Coresets in dynamic geometric data streams

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Smaller coresets for k-median and k-means clustering

SCG '05 Proceedings of the twenty-first annual symposium on Computational geometry
On k-Median clustering in high dimensions

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
The Effectiveness of Lloyd-Type Methods for the k-Means Problem

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Coresets forWeighted Facilities and Their Applications

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Linear time algorithms for clustering problems in any dimensions

ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming

Clustering for metric and non-metric distance measures

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Approximation algorithms for clustering uncertain data

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Facility Location in Dynamic Geometric Data Streams

ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Coresets and approximate clustering for Bregman divergences

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Single facility collection depots location problem in the plane

Computational Geometry: Theory and Applications
Private coresets

Proceedings of the forty-first annual ACM symposium on Theory of computing
Clustering for metric and nonmetric distance measures

ACM Transactions on Algorithms (TALG)
Universal ε-approximators for integrals

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Coresets and sketches for high dimensional subspace approximation problems

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Computational systems biology

Algorithms and theory of computation handbook
Improving the performance of k-means for color quantization

Image and Vision Computing
A unified framework for approximating and clustering data

Proceedings of the forty-third annual ACM symposium on Theory of computing
A near-linear algorithm for projective clustering integer points

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Data reduction for weighted and outlier-resistant clustering

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Streaming k-means on well-clusterable data

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Bregman clustering for separable instances

SWAT'10 Proceedings of the 12th Scandinavian conference on Algorithm Theory
From high definition image to low space optimization

SSVM'11 Proceedings of the Third international conference on Scale Space and Variational Methods in Computer Vision
StreamKM++: A clustering algorithm for data streams

Journal of Experimental Algorithmics (JEA)
Algorithmic superactivation of asymptotic quantum capacity of zero-capacity quantum channels

Information Sciences: an International Journal
Learning Big (Image) Data via Coresets for Dictionaries

Journal of Mathematical Imaging and Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a point set P ⊆ Rd the k-means clustering problem is to find a set C=(c1,...,ck) of k points and a partition of P into k clusters C1,...,Ck such that the sum of squared errors ∑i=1k ∑p ∈ Ci |p -ci |22 is minimized. For given centers this cost function is minimized byassigning points to the nearest center.The k-means cost function is probably the most widely used cost function in the area of clustering.In this paper we show that every unweighted point set P has a weak (ε, k)-coreset of size Poly(k,1/ε) for the k-means clustering problem, i.e. its size is independent of the cardinality |P| of the point set and the dimension d of the Euclidean space Rd. A weak coreset is a weighted set S ⊆ P together with a set T such that T contains a (1+ε)-approximation for the optimal cluster centers from P and for every set of kcenters from T the cost of the centers for S is a (1±ε)-approximation of the cost for P.We apply our weak coreset to obtain a PTAS for the k-means clustering problem with running time O(nkd + d · Poly(k/ε) + 2Õ(k/ε)).