Testing of Clustering

Authors:
Noga Alon;Seannie Dar;Michal Parnas;Dana Ron
Affiliations:
-;-;-;-
Venue:
SIAM Journal on Discrete Mathematics
Year:
2003

Citing 0
Cited 17

Algorithms column: sublinear time algorithms

ACM SIGACT News
Testing metric properties

Information and Computation
A New Conceptual Clustering Framework

Machine Learning
Tolerant property testing and distance approximation

Journal of Computer and System Sciences
On the Randomness Complexity of Property Testing

APPROX '07/RANDOM '07 Proceedings of the 10th International Workshop on Approximation and the 11th International Workshop on Randomization, and Combinatorial Optimization. Algorithms and Techniques
Property Testing: A Learning Theory Perspective

Foundations and Trends® in Machine Learning
On proximity oblivious testing

Proceedings of the forty-first annual ACM symposium on Theory of computing
Algorithmic and Analysis Techniques in Property Testing

Foundations and Trends® in Theoretical Computer Science
Property testing: a learning theory perspective

COLT'07 Proceedings of the 20th annual conference on Learning theory
Testing Euclidean spanners

ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part I
Sublinear-time algorithms

Property testing
Testing Euclidean spanners

Property testing
Sublinear-time algorithms

Property testing
Testing Euclidean spanners

Property testing
On approximating the number of relevant variables in a function

APPROX'11/RANDOM'11 Proceedings of the 14th international workshop and 15th international conference on Approximation, randomization, and combinatorial optimization: algorithms and techniques
On Proximity-Oblivious Testing

SIAM Journal on Computing
On Approximating the Number of Relevant Variables in a Function

ACM Transactions on Computation Theory (TOCT)

Quantified Score

Hi-index	0.00

Visualization

Abstract

A set X of points in $\Re^d$ is (k,b)-clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that, by sampling from a set X, distinguish between the case that X is (k,b)-clusterable and the case that X is $\epsilon$-far from being (k,b')-clusterable for any given $0k,b')-clusterable we mean that more than $\epsilon\cdot|X|$ points should be removed from X so that it becomes (k,b')-clusterable. We give algorithms for a variety of cost measures that use a sample of size independent of |X| and polynomial in k and $1/\epsilon$.Our algorithms can also be used to find approximately good clusterings. Namely, these are clusterings of all but an $\epsilon$-fraction of the points in X that have optimal (or close to optimal) cost. The benefit of our algorithms is that they construct an implicit representation of such clusterings in time independent of |X|. That is, without actually having to partition all points in X, the implicit representation can be used to answer queries concerning the cluster to which any given point belongs.