Algorithms for clustering data
Algorithms for clustering data
Machine Learning
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Resampling Method for Unsupervised Estimation of Cluster Validity
Neural Computation
Biological Data Mining
A sober look at clustering stability
COLT'06 Proceedings of the 19th annual conference on Learning Theory
Hi-index | 0.00 |
Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. In this beautiful area, one of the most difficult challenges is the model selection problem, i.e., the identification of the correct number of clusters in a dataset. In the last decade, a few novel techniques for model selection, representing a sharp departure from previous ones in statistics, have been proposed and gained prominence for microarray data analysis. Among those, the stability-based methods are the most robust and best performing in terms of prediction, but the slowest in terms of time. Unfortunately, this fascinating and classic area of statistics as model selection, with important practical applications, has received very little attention in terms of algorithmic design and engineering. In this paper, in order to partially fill this gap, we highlight: (A) the first general algorithmic paradigm for stability-based methods for model selection; (B) a novel algorithmic paradigm for the class of stability-based methods for cluster validity, i.e., methods assessing how statistically significant is a given clustering solution; (C) a general algorithmic paradigm that describes heuristic and very effective speed-ups known in the Literature for stability-based model selection methods.