On the Performance of Clustering in Hilbert Spaces

Authors:
G. Biau;L. Devroye;G. Lugosi
Affiliations:
LSTA & LPMA, Univ. Pierre et Marie Curie-Paris VI, Paris, France;-;-
Venue:
IEEE Transactions on Information Theory
Year:
2008

Citing 0
Cited 12

Generalization Bounds for K-Dimensional Coding Schemes in Hilbert Spaces

ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Quantization and clustering with Bregman divergences

Journal of Multivariate Analysis
Clustering of data and nearest neighbors search for pattern recognition with dimensionality reduction using random projections

ICAISC'10 Proceedings of the 10th international conference on Artificial intelligence and soft computing: Part I
K-dimensional coding schemes in Hilbert spaces

IEEE Transactions on Information Theory
Asymmetric k-means algorithm

ICANNGA'11 Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part II
The Sample Complexity of Dictionary Learning

The Journal of Machine Learning Research
Convergence of Distributed Asynchronous Learning Vector Quantization Algorithms

The Journal of Machine Learning Research
k-Means clustering of asymmetric data

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
On the performance of regularized regression learning in Hilbert space

Neurocomputing
Error analysis of coefficient-based regularized algorithm for density-level detection

Neural Computation
Generalization performance of support vector classifiers for density level detection

Neurocomputing
A statistical view of clustering performance through the theory of U-processes

Journal of Multivariate Analysis

Quantified Score

Hi-index	754.90

Visualization

Abstract

Based on randomly drawn vectors in a separable Hilbert space, one may construct a k-means clustering scheme by minimizing an empirical squared error. We investigate the risk of such a clustering scheme, defined as the expected squared distance of a random vector X from the set of cluster centers. Our main result states that, for an almost surely bounded , the expected excess clustering risk is O(驴1/n) . Since clustering in high (or even infinite)-dimensional spaces may lead to severe computational problems, we examine the properties of a dimension reduction strategy for clustering based on Johnson-Lindenstrauss-type random projections. Our results reflect a tradeoff between accuracy and computational complexity when one uses k-means clustering after random projection of the data to a low-dimensional space. We argue that random projections work better than other simplistic dimension reduction schemes.