A Deterministic Method for Initializing K-Means Clustering

Authors:
Ting Su;Jennifer Dy
Affiliations:
Northeastern University;Northeastern University
Venue:
ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Year:
2004

Citing 0
Cited 4

K-Means Initialization Methods for Improving Clustering by Simulated Annealing

IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence
Fuzzy PCA-guided robust k-means clustering

IEEE Transactions on Fuzzy Systems
Segmental K-means learning with mixture distribution for HMM based handwriting recognition

PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
Dynamic texture segmentation based on deterministic partially self-avoiding walks

Computer Vision and Image Understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of K-means clustering depends on the initial guess of partition. In this paper, we motivate theoretically and experimentally the use of a deterministic divisive hierarchical method, which we refer to as PCA-Part (Principal Components Analysis Partitioning) for initialization. The criterion that K-means clustering minimizes is the SSE (sum-squared-error) criterion. The first principal direction (the eigenvector corresponding to the largest eigenvalue of the covariance matrix) is the direction which contributes the largest SSE. Hence, a good candidate direction to project a cluster for splitting is, then, the first principal direction. This is the basis for PCA-Part initialization method. Our experiments reveal that generally PCA-Part leads K-means to generate clusters with SSE values close to the minimum SSE values obtained by one hundred random start runs. In addition, this deterministic initialization method often leads K-means to faster convergence (less iterations) compared to random methods. Furthermore, we also theoretically show and confirm experimentally on synthetic data when PCA-Part may fail.