On the Performance of Clustering in Hilbert Spaces

  • Authors:
  • G. Biau;L. Devroye;G. Lugosi

  • Affiliations:
  • LSTA & LPMA, Univ. Pierre et Marie Curie-Paris VI, Paris, France;-;-

  • Venue:
  • IEEE Transactions on Information Theory
  • Year:
  • 2008

Quantified Score

Hi-index 754.90

Visualization

Abstract

Based on randomly drawn vectors in a separable Hilbert space, one may construct a k-means clustering scheme by minimizing an empirical squared error. We investigate the risk of such a clustering scheme, defined as the expected squared distance of a random vector X from the set of cluster centers. Our main result states that, for an almost surely bounded , the expected excess clustering risk is O(驴1/n) . Since clustering in high (or even infinite)-dimensional spaces may lead to severe computational problems, we examine the properties of a dimension reduction strategy for clustering based on Johnson-Lindenstrauss-type random projections. Our results reflect a tradeoff between accuracy and computational complexity when one uses k-means clustering after random projection of the data to a low-dimensional space. We argue that random projections work better than other simplistic dimension reduction schemes.