Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity

Authors:
Richard Matthew Mccutchen;Samir Khuller
Affiliations:
University of Maryland,;University of Maryland,
Venue:
APPROX '08 / RANDOM '08 Proceedings of the 11th international workshop, APPROX 2008, and 12th international workshop, RANDOM 2008 on Approximation, Randomization and Combinatorial Optimization: Algorithms and Techniques
Year:
2008

Citing 12
Cited 3

Algorithms for clustering data

Algorithms for clustering data
Clustering algorithms

Information retrieval
Incremental clustering and dynamic information retrieval

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Algorithms for facility location problems with outliers

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Clustering Algorithms

Clustering Algorithms
Better streaming algorithms for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Robust shape fitting via peeling and grating coresets

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Achieving anonymity via clustering

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A constant factor approximation algorithm for k-median clustering with outliers

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Cluster Analysis

Cluster Analysis

Fast clustering using MapReduce

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Matroid and knapsack center problems

IPCO'13 Proceedings of the 16th international conference on Integer Programming and Combinatorial Optimization
Streaming with minimum space: An algorithm for covering by two congruent balls

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is a common problem in the analysis of large data sets. Streamingalgorithms, which make a single pass over the data set using small working memory and produce a clustering comparable in cost to the optimal offline solution, are especially useful. We develop the first streaming algorithms achieving a constant-factor approximation to the cluster radius for two variations of the k-center clustering problem. We give a streaming (4 + 茂戮驴)-approximation algorithm using O(茂戮驴茂戮驴 1kz) memory for the problem with outliers, in which the clustering is allowed to drop up to zof the input points; previous work used a random sampling approach which yields only a bicriteria approximation. We also give a streaming (6 + 茂戮驴)-approximation algorithm using O(茂戮驴茂戮驴 1ln (茂戮驴茂戮驴 1) k+ k2) memory for a variation motivated by anonymity considerations in which each cluster must contain at least a certain number of input points.