K-Means Initialization Methods for Improving Clustering by Simulated Annealing
IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence
Fuzzy PCA-guided robust k-means clustering
IEEE Transactions on Fuzzy Systems
Segmental K-means learning with mixture distribution for HMM based handwriting recognition
PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
Dynamic texture segmentation based on deterministic partially self-avoiding walks
Computer Vision and Image Understanding
Hi-index | 0.00 |
The performance of K-means clustering depends on the initial guess of partition. In this paper, we motivate theoretically and experimentally the use of a deterministic divisive hierarchical method, which we refer to as PCA-Part (Principal Components Analysis Partitioning) for initialization. The criterion that K-means clustering minimizes is the SSE (sum-squared-error) criterion. The first principal direction (the eigenvector corresponding to the largest eigenvalue of the covariance matrix) is the direction which contributes the largest SSE. Hence, a good candidate direction to project a cluster for splitting is, then, the first principal direction. This is the basis for PCA-Part initialization method. Our experiments reveal that generally PCA-Part leads K-means to generate clusters with SSE values close to the minimum SSE values obtained by one hundred random start runs. In addition, this deterministic initialization method often leads K-means to faster convergence (less iterations) compared to random methods. Furthermore, we also theoretically show and confirm experimentally on synthetic data when PCA-Part may fail.