Incremental clustering and dynamic information retrieval
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Geometric algorithms for the minimum cost assignment problem
Random Structures & Algorithms
Sublinear time algorithms for metric space problems
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Approximation algorithms for min-sum p-clustering
Discrete Applied Mathematics
P-Complete Approximation Problems
Journal of the ACM (JACM)
Clustering for edge-cost minimization (extended abstract)
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Approximating min-sum k-clustering in metric spaces
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Approximate clustering via core-sets
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Better streaming algorithms for clustering problems
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Approximation schemes for clustering problems
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
A Sublinear Time Approximation Scheme for Clustering in Metric Spaces
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
High-dimensional computational geometry
High-dimensional computational geometry
On coresets for k-means and k-median clustering
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Algorithms for dynamic geometric problems over data streams
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Optimal Time Bounds for Approximate Clustering
Machine Learning
A k-Median Algorithm with Running Time Independent of Data Size
Machine Learning
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions
FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Quick k-Median, k-Center, and Facility Location for Sparse Graphs
SIAM Journal on Computing
Coresets in dynamic geometric data streams
Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Smaller coresets for k-median and k-means clustering
SCG '05 Proceedings of the twenty-first annual symposium on Computational geometry
On k-Median clustering in high dimensions
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Facility location in sublinear time
ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Linear time algorithms for clustering problems in any dimensions
ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Streaming Embeddings with Slack
WADS '09 Proceedings of the 11th International Symposium on Algorithms and Data Structures
Clustering under approximation stability
Journal of the ACM (JACM)
Hi-index | 0.00 |
The min-sum k-clustering problem is to partition a metric space (P, d) into k clusters C1, . . . , Ck ⊆ P such that Σi=1k Σ p,q∈Ci d(p,q) is minimized. We show the first efficient construction of a coreset for this problem. Our coreset construction is based on a new adaptive sampling algorithm. Using our coresets we obtain three main algorithmic results. The first result is a sublinear time (4+Ɛ)-approximation algorithm for the min-sum k-clustering problem in metric spaces. The running time of this algorithm is Õ(n) for any constant k and Ɛ, and it is o(n2) for all k = o(log n/ log log n). Since the description size of the input is Θ(n2), this is sublinear in the input size. Our second result is the first pass-efficient data streaming algorithm for min-sum k-clustering in the distance oracle model, i.e., an algorithm that uses poly(log n, k) space and makes 2 passes over the input point set arriving as a data stream. Our third result is a sublinear-time polylogarithmic-factor approximation algorithm for the min-sum k-clustering problem for arbitrary values of k. To develop the coresets, we introduce the concept of a-preserving metric embeddings. Such an embedding satisfies properties that (a) the distance between any pair of points does not decrease, and (b) the cost of an optimal solution for the considered problem on input (P, d′) is within a constant factor of the optimal solution on input (P, d). In other words, the idea is find a metric embedding into a (structurally simpler) metric space that approximates the original metric up to a factor of a with respect to a certain problem. We believe that this concept is an interesting generalization of coresets.