Editorial: New fuzzy c-means clustering model based on the data weighted approach

  • Authors:
  • Chenglong Tang;Shigang Wang;Wei Xu

  • Affiliations:
  • School of Mechanical and Dynamical Engineering of Shanghai Jiao Tong University, No.800 Dong Chuan Road, Minhang District, Shanghai 200240, PR China;School of Mechanical and Dynamical Engineering of Shanghai Jiao Tong University, No.800 Dong Chuan Road, Minhang District, Shanghai 200240, PR China;School of Mechanical and Dynamical Engineering of Shanghai Jiao Tong University, No.800 Dong Chuan Road, Minhang District, Shanghai 200240, PR China

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a new kind of data weighted fuzzy c-means clustering approach. Different from most existing fuzzy clustering approaches, the data weighted clustering approach considers the internal connectivity of all data points. An exponent impact factors vector and an influence exponent are introduced to the new model. Together they influence the clustering process. The data weighted clustering can simultaneously produce three categories of parameters: fuzzy membership degrees, exponent impact factors and the cluster prototypes. A new fuzzy algorithm, DWG-K, is developed by combining the data weighted approach and the G-K. Two groups of numerical experiments were executed. Group 1 demonstrates the clustering performance of the DWG-K. The counterpart is the G-K. The results show the DWG-K can obtain better clustering quality and meanwhile it holds the same level of computational efficiency as the G-K holds. Group 2 checks the ability of the DWG-K in mining the outliers. The counterpart is the well-known LOF. The results show the DWG-K has considerable advantage over the LOF in computational efficiency. And the outliers mined by the DWG-K are global. It was pointed out that the data weighted clustering approach has its unique advantages when mining the outliers of the large scale data sets, when clustering the data set for better clustering results, and especially when these two tasks are done simultaneously.