Generalization Bounds for K-Dimensional Coding Schemes in Hilbert Spaces
ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Quantization and clustering with Bregman divergences
Journal of Multivariate Analysis
ICAISC'10 Proceedings of the 10th international conference on Artificial intelligence and soft computing: Part I
K-dimensional coding schemes in Hilbert spaces
IEEE Transactions on Information Theory
ICANNGA'11 Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part II
The Sample Complexity of Dictionary Learning
The Journal of Machine Learning Research
Convergence of Distributed Asynchronous Learning Vector Quantization Algorithms
The Journal of Machine Learning Research
k-Means clustering of asymmetric data
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
A statistical view of clustering performance through the theory of U-processes
Journal of Multivariate Analysis
Hi-index | 754.90 |
Based on randomly drawn vectors in a separable Hilbert space, one may construct a k-means clustering scheme by minimizing an empirical squared error. We investigate the risk of such a clustering scheme, defined as the expected squared distance of a random vector X from the set of cluster centers. Our main result states that, for an almost surely bounded , the expected excess clustering risk is O(驴1/n) . Since clustering in high (or even infinite)-dimensional spaces may lead to severe computational problems, we examine the properties of a dimension reduction strategy for clustering based on Johnson-Lindenstrauss-type random projections. Our results reflect a tradeoff between accuracy and computational complexity when one uses k-means clustering after random projection of the data to a low-dimensional space. We argue that random projections work better than other simplistic dimension reduction schemes.