Bi-criteria linear-time approximations for generalized k-mean/median/center

Authors:
Dan Feldman;Amos Fiat;Micha Sharir;Danny Segev
Affiliations:
Tel Aviv University, Tel Aviv, Israel;Tel Aviv University, Tel Aviv, Israel;Tel Aviv University, Tel Aviv, Israel;Tel Aviv University, Tel Aviv, Israel
Venue:
SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
Year:
2007

Citing 16
Cited 6

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Sublinear time algorithms for metric space problems

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Approximation algorithms for projective clustering

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Projective clustering in high dimensions using core-sets

Proceedings of the eighteenth annual symposium on Computational geometry
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Approximation Algorithms for k-Line Center

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Clustering Motion

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
On coresets for k-means and k-median clustering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
k-means projective clustering

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Matrix approximation and projective clustering via volume sampling

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Improved Approximation Algorithms for Large Matrices via Random Projections

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Coresets forWeighted Facilities and Their Applications

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Efficient subspace approximation algorithms

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Adaptive sampling and fast low-rank matrix approximation

APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation

Private coresets

Proceedings of the forty-first annual ACM symposium on Theory of computing
Computational systems biology

Algorithms and theory of computation handbook
A unified framework for approximating and clustering data

Proceedings of the forty-third annual ACM symposium on Theory of computing
A near-linear algorithm for projective clustering integer points

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
From high definition image to low space optimization

SSVM'11 Proceedings of the Third international conference on Scale Space and Variational Methods in Computer Vision
Learning Big (Image) Data via Coresets for Dictionaries

Journal of Mathematical Imaging and Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of approximating a set P of n points in Rd by a collection of j-dimensional flats, andextensions thereof, under the standard median / mean / centermeasures, in which we wish to minimize, respectively, the sum of thedistances from each point of P to its nearest flat, the sum of thesquares of these distances, or the maximal such distance.Such problems cannot be approximated unless P=NP but do allowbi-criteria approximations where one allows some leeway in both the numberof flats and the quality of the objective function.We give a very simple bi-criteria approximation algorithm, which producesat most α(k,j,n) = (k j log n)O(j) flats, which exceeds the optimalobjective value for any k j-dimensional flats by a factor of nomore than β(j)= 2O(j). Given this bi-criteria approximation, wecan use it to reduce the approximation factor arbitrarily, at the costof increasing the number of flats. Our algorithm hasmany advantages over previous work, in that it is muchmore widely applicable (wider set of objective functions and classes ofclusters) and much more efficient -- reducing the running time bound from O(n Poly(k,j)) to nd · (jk)O(j).Our algorithm is randomized and successful with probability 1/2(easily boosted to probabilities arbitrary close to 1).