Applications of an enhanced cluster validity index method based on the Fuzzy C-means and rough set theories to partition and classification

  • Authors:
  • Kuang Yu Huang

  • Affiliations:
  • Department of Information Management, Ling Tung University, #1 Ling Tung Road, Taichung City 408, Taiwan

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2010

Quantified Score

Hi-index 12.07

Visualization

Abstract

This study proposes a method of cluster validity index that simultaneously provide the measurements of goodness of clustering on clustered data and of classification accuracy for complicated information systems based upon the PBMF-index method and rough set (RS) theory. The maximum value of this index, called the Huang-index, not only provides the best partitioning, but also obtains the optimal accuracy of classification for the approximation sets. The traditional PBMF-index method is only used to ensure the formation of a small number of compact clusters with large separation between at least two clusters. In contrast to the traditional PBMF-index method, the Huang-index method extends the applications of unsupervised optimal cluster to the fields of classification. In the proposed algorithm, all the attributes of the data are first clustered into groups using the Fuzzy C-means (FCM) method. The clustered data are then used to identify approximate regions and classification accuracy and to calculate centroids of clusters for decision attribute based on the RS theory. Finally, all those calculated data are put into the proposed index method to find the cluster validity index. The validity of the proposed approach is demonstrated using the data derived from a hypothetical function of two independent variables and electronic stock data extracted from the financial database maintained by the Taiwan Economic Journal (TEJ). The clustering results obtained using the proposed method are compared with the results obtained using the traditional PBMF-index partition method. The effects of the number of clusters on the partitions of clusters and the RS regions are systematically examined and compared. The results show that the proposed Huang-index method not only yields a superior clustering capability than the traditional clustering algorithm, but also yields a reliable classification and obtains a set of suitable decision rules extracted from the RS theory.