Instability and cluster stability variance for real clusterings
Information Sciences: an International Journal
Hi-index | 0.00 |
Because of its conceptual simplicity, k-means is one of the most commonly used clustering algorithms. However, its performance in terms of global optimality depends heavily on both the selection of k and the selection of the initial cluster centers. On the other hand, Mean Shift clustering does not rely upon a priori knowledge of the number of clusters. Furthermore, it finds the modes of the underlying probability density function of the observations, which would be a good choice of initial cluster centers for k-means. We present a Mean Shift-based initialization method for k-means. A comparative study of the proposed and other initialization methods is performed on two real-life problems with very large amounts of data: Facility Location and Molecular Dynamics. In the study, the proposed initialization method outperforms the other methods in terms of clustering performance.