Approximate kernel k-means: solution to large scale kernel clustering

  • Authors:
  • Radha Chitta;Rong Jin;Timothy C. Havens;Anil K. Jain

  • Affiliations:
  • Michigan State University, East Lansing, MI, USA;Michigan State University, East Lansing, MI, USA;Michigan State University, East Lansing, MI, USA;Michigan State University, East Lansing, MI, USA

  • Venue:
  • Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Digital data explosion mandates the development of scalable tools to organize the data in a meaningful and easily accessible form. Clustering is a commonly used tool for data organization. However, many clustering algorithms designed to handle large data sets assume linear separability of data and hence do not perform well on real world data sets. While kernel-based clustering algorithms can capture the non-linear structure in data, they do not scale well in terms of speed and memory requirements when the number of objects to be clustered exceeds tens of thousands. We propose an approximation scheme for kernel k-means, termed approximate kernel k-means, that reduces both the computational complexity and the memory requirements by employing a randomized approach. We show both analytically and empirically that the performance of approximate kernel k-means is similar to that of the kernel k-means algorithm, but with dramatically reduced run-time complexity and memory requirements.