Fast and robust general purpose clustering algorithms

  • Authors:
  • Vladimir Estivill-Castro;Jianhua Yang

  • Affiliations:
  • Department of Computer Science & Software Engineering, The University of Newcastle, Callaghan, NSW, Australia;Department of Computer Science & Software Engineering, The University of Newcastle, Callaghan, NSW, Australia

  • Venue:
  • PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

General purpose and highly applicable clustering methods are required for knowledge discovery. K-MEANS has been adopted as the prototype of iterative model-based clustering because of its speed, simplicity and capability to work within the format of very large databases. However, K-MEANS has several disadvantages derived from its statistical simplicity. We propose algorithms that remain very efficient, generally applicable, multidimensional but are more robust to noise and outliers. We achieve this by using medians rather than means as estimators of centers of clusters. Comparison with K-MEANS, EM and GiBBS sampling demonstrates the advantages of our algorithms.