Quantization-based clustering algorithm

  • Authors:
  • Zhiwen Yu;Hau-San Wong

  • Affiliations:
  • School of Computer Science and Engineering, South China University of Technology, China and Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong;Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

  • Venue:
  • Pattern Recognition
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper, a quantization-based clustering algorithm (QBCA) is proposed to cluster a large number of data points efficiently. Unlike previous clustering algorithms, QBCA places more emphasis on the computation time of the algorithm. Specifically, QBCA first assigns the data points to a set of histogram bins by a quantization function. Then, it determines the initial centers of the clusters according to this point distribution. Finally, QBCA performs clustering at the histogram bin level, rather than the data point level. We also propose two approaches to improve the performance of QBCA further: (i) a shrinking process is performed on the histogram bins to reduce the number of distance computations and (ii) a hierarchical structure is constructed to perform efficient indexing on the histogram bins. Finally, we analyze the performance of QBCA theoretically and experimentally and show that the approach: (1) can be easily implemented, (2) identifies the clusters effectively and (3) outperforms most of the current state-of-the-art clustering approaches in terms of efficiency.