Testing of clustering

Authors:
N. Alon;S. Dar;M. Parnas;D. Ron
Affiliations:
-;-;-;-
Venue:
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Year:
2000

Citing 0
Cited 26

Sublinear time approximate clustering

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Soft kinetic data structures

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Testing metric properties

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Projective clustering in high dimensions using core-sets

Proceedings of the eighteenth annual symposium on Computational geometry
Exact and approximate testing/correcting of algebraic functions: a survey

Theoretical aspects of computer science
Approximating the Minimum Spanning Tree Weight in Sublinear Time

ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
Search and Classification of High Dimensional Data

APPROX '02 Proceedings of the 5th International Workshop on Approximation Algorithms for Combinatorial Optimization
Exact and Approximate Testing/Correcting of Algebraic Functions: A Survey

Theoretical Aspects of Computer Science, Advanced Lectures [First Summer School on Theoretical Aspects of Computer Science, Tehran, Iran, July 2000]
Property Testing with Geometric Queries

ESA '01 Proceedings of the 9th Annual European Symposium on Algorithms
Better streaming algorithms for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Approximation schemes for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Testing subgraphs in directed graphs

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
A characterization of easily testable induced subgraphs

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Approximate minimum enclosing balls in high dimensions using core-sets

Journal of Experimental Algorithmics (JEA)
Estimating the weight of metric minimum spanning trees in sublinear-time

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A k-Median Algorithm with Running Time Independent of Data Size

Machine Learning
Testing subgraphs in directed graphs

Journal of Computer and System Sciences - Special issue: STOC 2003
Self-improving algorithms

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
A combinatorial characterization of the testable graph properties: it's all about regularity

Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Online geometric reconstruction

Proceedings of the twenty-second annual symposium on Computational geometry
A Characterization of Easily Testable Induced Subgraphs

Combinatorics, Probability and Computing
Online geometric reconstruction

Journal of the ACM (JACM)
Coresets for discrete integration and clustering

FSTTCS'06 Proceedings of the 26th international conference on Foundations of Software Technology and Theoretical Computer Science
Clustering under approximation stability

Journal of the ACM (JACM)

Quantified Score

Hi-index	0.00

Visualization

Abstract

A set X of points in /spl Rfr//sup d/ is (k,b)-clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that by sampling from a set X, distinguish between the case that X is (k,b)-clusterable and the case that X is /spl epsiv/-far from being (k,b')-clusterable for any given 0/spl epsiv//spl les/1 and for b'/spl ges/b. In /spl epsiv/-far from being (k,b')-clusterable we mean that more than /spl epsiv/.|X| points should be removed from X so that it becomes (k,b')-clusterable. We give algorithms for a variety of cost measures that use a sample of size independent of |X|, and polynomial in k and 1//spl epsiv/. Our algorithms can also be used to find approximately good clusterings. Namely, these are clusterings of all but an /spl epsiv/-fraction of the points in X that have optimal (or close to optimal) cost. The benefit of our algorithms is that they construct an implicit representation of such clusterings in time independent of |X|. That is, without actually having to partition all points in X, the implicit representation can be used to answer queries concerning the cluster any given point belongs to.