An enhanced classification method comprising a genetic algorithm, rough set theory and a modified PBMF-index function

  • Authors:
  • Kuang Yu Huang

  • Affiliations:
  • Department of Information Management, Ling Tung University, #1 Ling Tung Road, Taichung City 408, Taiwan

  • Venue:
  • Applied Soft Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This study proposes a method, designated as the GRP-index method, for the classification of continuous value datasets in which the instances do not provide any class information and may be imprecise and uncertain. The proposed method discretizes the values of the individual attributes within the dataset and achieves both the optimal number of clusters and the optimal classification accuracy. The proposed method consists of a genetic algorithm (GA) and an FRP-index method. In the FRP-index method, the conditional and decision attribute values of the instances in the dataset are fuzzified and discretized using the Fuzzy C-means (FCM) method in accordance with the cluster vectors given by the GA specifying the number of clusters per attribute. Rough set (RS) theory is then applied to determine the lower and upper approximate sets associated with each cluster of the decision attribute. The accuracy of approximation of each cluster of the decision attribute is then computed as the cardinality ratio of the lower approximate sets to the upper approximate sets. Finally, the centroids of the lower approximate sets associated with each cluster of the decision attribute are determined by computing the mean conditional and decision attribute values of all the instances within the corresponding sets. The cluster centroids and accuracy of approximation are then processed by a modified form of the PBMF-index function, designated as the RP-index function, in order to determine the optimality of the discretization/classification results. In the event that the termination criteria are not satisfied, the GA modifies the initial population of cluster vectors and the FCM, RS and RP-index function procedures are repeated. The entire process is repeated iteratively until the termination criteria are satisfied. The maximum value of the RP cluster validity index is then identified, and the corresponding cluster vector is taken as the optimal classification result. The validity of the proposed approach is confirmed by cross validation, and by comparing the classification results obtained for a typical stock market dataset with those obtained by non-supervised and pseudo-supervised classification methods. The results show that the proposed GRP-index method not only has a better discretization performance than the considered methods, but also achieves a better accuracy of approximation, and therefore provides a more reliable basis for the extraction of decision-making rules.