Sublinear time approximate clustering
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Approximate clustering via core-sets
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Projective clustering in high dimensions using core-sets
Proceedings of the eighteenth annual symposium on Computational geometry
Exact and approximate testing/correcting of algebraic functions: a survey
Theoretical aspects of computer science
Approximating the Minimum Spanning Tree Weight in Sublinear Time
ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
Search and Classification of High Dimensional Data
APPROX '02 Proceedings of the 5th International Workshop on Approximation Algorithms for Combinatorial Optimization
Exact and Approximate Testing/Correcting of Algebraic Functions: A Survey
Theoretical Aspects of Computer Science, Advanced Lectures [First Summer School on Theoretical Aspects of Computer Science, Tehran, Iran, July 2000]
Property Testing with Geometric Queries
ESA '01 Proceedings of the 9th Annual European Symposium on Algorithms
Better streaming algorithms for clustering problems
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Approximation schemes for clustering problems
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Testing subgraphs in directed graphs
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
A characterization of easily testable induced subgraphs
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Approximate minimum enclosing balls in high dimensions using core-sets
Journal of Experimental Algorithmics (JEA)
Estimating the weight of metric minimum spanning trees in sublinear-time
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A k-Median Algorithm with Running Time Independent of Data Size
Machine Learning
Testing subgraphs in directed graphs
Journal of Computer and System Sciences - Special issue: STOC 2003
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
A combinatorial characterization of the testable graph properties: it's all about regularity
Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Online geometric reconstruction
Proceedings of the twenty-second annual symposium on Computational geometry
A Characterization of Easily Testable Induced Subgraphs
Combinatorics, Probability and Computing
Online geometric reconstruction
Journal of the ACM (JACM)
Coresets for discrete integration and clustering
FSTTCS'06 Proceedings of the 26th international conference on Foundations of Software Technology and Theoretical Computer Science
Clustering under approximation stability
Journal of the ACM (JACM)
Hi-index | 0.00 |
A set X of points in /spl Rfr//sup d/ is (k,b)-clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that by sampling from a set X, distinguish between the case that X is (k,b)-clusterable and the case that X is /spl epsiv/-far from being (k,b')-clusterable for any given 0/spl epsiv//spl les/1 and for b'/spl ges/b. In /spl epsiv/-far from being (k,b')-clusterable we mean that more than /spl epsiv/.|X| points should be removed from X so that it becomes (k,b')-clusterable. We give algorithms for a variety of cost measures that use a sample of size independent of |X|, and polynomial in k and 1//spl epsiv/. Our algorithms can also be used to find approximately good clusterings. Namely, these are clusterings of all but an /spl epsiv/-fraction of the points in X that have optimal (or close to optimal) cost. The benefit of our algorithms is that they construct an implicit representation of such clusterings in time independent of |X|. That is, without actually having to partition all points in X, the implicit representation can be used to answer queries concerning the cluster any given point belongs to.