A novel typical-sample-weighted clustering algorithm for large data sets

  • Authors:
  • Jie Li;Xinbo Gao;Licheng Jiao

  • Affiliations:
  • School of Electronic Engineering, Xidian University, Xi’an, China;School of Electronic Engineering, Xidian University, Xi’an, China;School of Electronic Engineering, Xidian University, Xi’an, China

  • Venue:
  • CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the field of cluster analysis, most of existing algorithms are developed for small data sets, which cannot effectively process the large data sets encountered in data mining. Moreover, most clustering algorithms consider the contribution of each sample for classification uniformly. In fact, different samples should be of different contribution for clustering result. For this purpose, a novel typical-sample-weighted clustering algorithm is proposed for large data sets. By the atom clustering, the new algorithm extracts the typical samples to reduce the data amount. Then the extracted samples are weighted by their corresponding typicality and then clustered by the classical fuzzy c-means (FCM) algorithm. Finally, the Mahalanobis distance is employed to classify each original sample into obtained clusters. It is obvious that the novel algorithm can improve the speed and robustness of the traditional FCM algorithm. The experimental results with various test data sets illustrate the effectiveness of the proposed clustering algorithm.