A k-Median Algorithm with Running Time Independent of Data Size

Authors:
Adam Meyerson;Liadan O'Callaghan;Serge Plotkin
Affiliations:
Department of Computer Science, University of California, Los Angeles. awm@cs.ucla.edu;Department of Computer Science, Stanford University. loc@cs.stanford.edu;Department of Computer Science, Stanford University. plotkin@theory.stanford.edu
Venue:
Machine Learning
Year:
2004

Citing 30
Cited 7

Optimal algorithms for approximate clustering

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
e-approximations with minimum packing constraint violation (extended abstract)

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Incremental clustering and dynamic information retrieval

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximation schemes for Euclidean k-medians and related problems

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Sublinear time algorithms for metric space problems

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Accelerating exact k-means algorithms with geometric reasoning

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Density biased sampling: an improved method for data mining and clustering

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
K-medians, facility location, and the Chernoff-Wald bound

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Sublinear time approximate clustering

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Algorithms for facility location problems with outliers

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Clustering to minimize the sum of cluster diameters

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Local search heuristic for k-median and facility location problems

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Clustering Algorithms

Clustering Algorithms
Scaling mining algorithms to large databases

Communications of the ACM - Evolving data mining into solutions for insights
A constant-factor approximation algorithm for the k-median problem

Journal of Computer and System Sciences - STOC 1999
Criteria for Polynomial-Time (Conceptual) Clustering

Machine Learning
Correlation Clustering

FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
Improved Combinatorial Algorithms for the Facility Location and k-Median Problems

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Learning Mixtures of Gaussians

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Primal-Dual Approximation Algorithms for Metric Facility Location and k-Median Problems

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Testing of clustering

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Online Facility Location

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Some Extensions of the K-Means Algorithm for Image Segmentation and Pattern Classification

Some Extensions of the K-Means Algorithm for Image Segmentation and Pattern Classification
Approximation algorithms for facility location problems

Approximation algorithms for facility location problems
Greedy facility location algorithms analyzed using dual fitting with factor-revealing LP

Journal of the ACM (JACM)
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Optimal time bounds for approximate clustering

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering

Machine Learning
Algorithms for K-means clustering problem with balancing constraint

CCDC'09 Proceedings of the 21st annual international conference on Chinese control and decision conference
Small space representations for metric min-sum k-clustering and their applications

STACS'07 Proceedings of the 24th annual conference on Theoretical aspects of computer science
Approximation algorithms for k-modes clustering

ICIC'06 Proceedings of the 2006 international conference on Intelligent computing: Part II
Sublinear-time algorithms

Property testing
Sublinear-time algorithms

Property testing
Fast k-means algorithms with constant approximation

ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We give a sampling-based algorithm for the k-Median problem, with running time O(k(\frac{k^2}{\epsilon} log k)2 log(\frac{k}{\epsilon} log k)), where k is the desired number of clusters and ε is a confidence parameter. This is the first k-Median algorithm with fully polynomial running time that is independent of n, the size of the data set. It gives a solution that is, with high probability, an O(1)-approximation, if each cluster in some optimal solution has Ω(\frac{n\epsilon}{k}) points. We also give weakly-polynomial-time algorithms for this problem and a relaxed version of k-Median in which a small fraction of outliers can be excluded. We give near-matching lower bounds showing that this assumption about cluster size is necessary. We also present a related algorithm for finding a clustering that excludes a small number of outliers.