Using optimisation techniques for discretizing rough set partitions

Authors:
Bodie Crossingham;Tshilidzi Marwala
Affiliations:
(Correspd. Tel.: +27 117 177 217/ Fax: +27 114 031 929/ E-mail: bodie@lutrin.co.za) School of Electrical and Information Engineering, University of the Witwatersrand, Private Bag X3/ WITS, 2050/ S ...;School of Electrical and Information Engineering, University of the Witwatersrand, Private Bag X3/ WITS, 2050/ South Africa
Venue:
International Journal of Hybrid Intelligent Systems - Computational Models for Life Sciences
Year:
2008

Citing 8
Cited 0

Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Rough sets perspective on data and knowledge

Handbook of data mining and knowledge discovery
Applying rough set theory to multi stage medical diagnosing

Fundamenta Informaticae
A genetic algorithm for minimizing maximum lateness on parallel identical batch processing machines with dynamic job arrivals and incompatible job families

Computers and Operations Research
Neuro-fuzzy modeling and fuzzy rule extraction applied to conflict management

ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
Extending particle swarm optimisation via genetic programming

EuroGP'05 Proceedings of the 8th European conference on Genetic Programming
Application of Simulated Annealing to the Biclustering of Gene Expression Data

IEEE Transactions on Information Technology in Biomedicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Rough set theory (RST) is concerned with the formal approximation of crisp sets and is a mathematical tool which deals with vagueness and uncertainty. This paper presents an approach to optimize rough set partition sizes using various optimization techniques. The forecasting accuracy is measured by using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. The four optimization techniques used are genetic algorithm, particle swarm optimization, hill climbing and simulated annealing. This proposed method is tested on two data sets, namely, the human immunodeficiency virus (HIV) data set and the militarized interstate dispute (MID) data set. The results obtained from this granulization method are compared to two previous static granulization methods, namely, equal-width-bin and equal-frequency-bin partitioning. The results conclude that all of the proposed optimized methods produce higher forecasting accuracies than that of the two static methods. In the case of the HIV data set, the hill climbing approach produced the highest accuracy; an accuracy of 69.02% is achieved in a time of 210.4 hours. For the MID data, the genetic algorithm approach produced the highest accuracy. The accuracy achieved is 95.82% in a time of 7 hours. The rules generated from the rough set are linguistic and easy-to-interpret, but this does come at the expense of the accuracy lost in the discretization process where the granularity of the variables is decreased.