A simulated annealing algorithm for the clustering problem
Pattern Recognition
Computational experience on four algorithms for the hard clustering problem
Pattern Recognition Letters
An elementary proof of a theorem of Johnson and Lindenstrauss
Random Structures & Algorithms
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
Database-friendly random projections: Johnson-Lindenstrauss with binary coins
Journal of Computer and System Sciences - Special issu on PODS 2001
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions
FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Very sparse random projections
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Experiments with random projection
UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Least squares quantization in PCM
IEEE Transactions on Information Theory
Hi-index | 0.10 |
In this text we propose a method which efficiently performs clustering of high-dimensional data. The method builds on random projection and the K-means algorithm. The idea is to apply K-means several times, increasing the dimensionality of the data after each convergence of K-means. We compare the proposed algorithm on four high-dimensional datasets, image, text and two synthetic, with K-means clustering using a single random projection and K-means clustering of the original high-dimensional data. Regarding time we observe that the algorithm reduces drastically the time when compared to K-means on the original high-dimensional data. Regarding mean squared error the proposed method reaches a better solution than clustering using a single random projection. More notably in the experiments performed it also reaches a better solution than clustering on the original high-dimensional data.