Using optimisation techniques for discretizing rough set partitions

  • Authors:
  • Bodie Crossingham;Tshilidzi Marwala

  • Affiliations:
  • (Correspd. Tel.: +27 117 177 217/ Fax: +27 114 031 929/ E-mail: bodie@lutrin.co.za) School of Electrical and Information Engineering, University of the Witwatersrand, Private Bag X3/ WITS, 2050/ S ...;School of Electrical and Information Engineering, University of the Witwatersrand, Private Bag X3/ WITS, 2050/ South Africa

  • Venue:
  • International Journal of Hybrid Intelligent Systems - Computational Models for Life Sciences
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Rough set theory (RST) is concerned with the formal approximation of crisp sets and is a mathematical tool which deals with vagueness and uncertainty. This paper presents an approach to optimize rough set partition sizes using various optimization techniques. The forecasting accuracy is measured by using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. The four optimization techniques used are genetic algorithm, particle swarm optimization, hill climbing and simulated annealing. This proposed method is tested on two data sets, namely, the human immunodeficiency virus (HIV) data set and the militarized interstate dispute (MID) data set. The results obtained from this granulization method are compared to two previous static granulization methods, namely, equal-width-bin and equal-frequency-bin partitioning. The results conclude that all of the proposed optimized methods produce higher forecasting accuracies than that of the two static methods. In the case of the HIV data set, the hill climbing approach produced the highest accuracy; an accuracy of 69.02% is achieved in a time of 210.4 hours. For the MID data, the genetic algorithm approach produced the highest accuracy. The accuracy achieved is 95.82% in a time of 7 hours. The rules generated from the rough set are linguistic and easy-to-interpret, but this does come at the expense of the accuracy lost in the discretization process where the granularity of the variables is decreased.