Novel multi-centroid, multi-run sampling schemes for $K$-medoids-based algorithms

  • Authors:
  • Shu-Chuan Chu;John F. Roddick;Jeng-Shyang Pan

  • Affiliations:
  • Sch. of Inform. and Eng., Flinders Univ. of South Aus., GPO Box 2100, Adelaide 5001, South Australia and Dept. of Industrial Eng. and Mgmt., Kaohsiung Univ. of Applied Sci., Kaohsiung, Taiwan;School of Informatics and Engineering, Flinders University of South Australia, GPO Box 2100, Adelaide 5001, South Australia;Department of Electronic Engineering, Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan (Correspd. Tel.: +886 7 3814526/ ext. 5636/ Fax: +886 7 3811182/ E-mail: jspan@cc.kuas.edu.tw)

  • Venue:
  • International Journal of Knowledge-based and Intelligent Engineering Systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering in data mining is used to group similar objects based on their distance, connectivity, relative density, or some specific characteristics. Data clustering has become an important task for discovering significant patterns and characteristics in large spatial databases. The k-medoids-based algorithms have been shown to be effective to spherical-shaped clusters with outliers. However, they are not efficient for large database. In this paper, we propose two novel algorithms - Multi-Centroid with Multi-Run Sampling Scheme, which we termed MCMRS, and a more advanced sampling scheme termed the Incremental Multi-Centroid, Multi-Run Sampling Scheme, which called simply (IMCMRS) hereafter, to improve the performance of many k-medoids-based algorithms including PAM, CLARA and CLARANS. Experimental results demonstrate the proposed scheme can not only reduce by more than 80% computation time but also reduce the average distance per object compared with CLARA and CLARANS. IMCMRS is also superior to MCMRS.