Vector quantization based approximate spectral clustering of large datasets

  • Authors:
  • Kadim Taşdemir

  • Affiliations:
  • European Commission Joint Research Centre, Institute for Environment and Sustainability, Via E. Fermi 2749, Ispra (VA), Italy

  • Venue:
  • Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.01

Visualization

Abstract

Spectral partitioning, recently popular for unsupervised clustering, is infeasible for large datasets due to its computational complexity and memory requirement. Therefore, approximate spectral clustering of data representatives (selected by various sampling methods) was used. Alternatively, we propose to use neural networks (self-organizing maps and neural gas), which are shown successful in quantization with small distortion, as preliminary sampling for approximate spectral clustering (ASC). We show that they usually outperform k-means sampling (which was shown superior to various sampling methods), in terms of clustering accuracy obtained by ASC. More importantly, for quantization based ASC, we introduce a local density-based similarity measure - constructed without any user-set parameter - which achieves accuracies superior to the accuracies of commonly used distance based similarity.