Differential Evolution for automatic rule extraction from medical databases

  • Authors:
  • Ivanoe De Falco

  • Affiliations:
  • Institute of High Performance Computing and Networking, National Research Council of Italy (ICAR-CNR), 80131 Naples, Italy

  • Venue:
  • Applied Soft Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, a new approach based on Differential Evolution (DE) for the automatic classification of items in medical databases is proposed. Based on it, a tool called DEREx is presented, which automatically extracts explicit knowledge from the database under the form of IF-THEN rules containing AND-connected clauses on the database variables. Each DE individual codes for a set of rules. For each class more than one rule can be contained in the individual, and these rules can be seen as logically connected in OR. Furthermore, all the classifying rules for all the classes are found all at once in one step. DEREx is thought as a useful support to decision making whenever explanations on why an item is assigned to a given class should be provided, as it is the case for diagnosis in the medical domain. The major contribution of this paper is that DEREx is the first classification tool in literature that is based on DE and automatically extracts sets of IF-THEN rules without the intervention of any other mechanism. In fact, all other classification tools based on DE existing in literature either simply find centroids for the classes rather than extracting rules, or are hybrid systems in which DE simply optimizes some parameters whereas the classification capabilities are provided by other mechanisms. For the experiments eight databases from the medical domain have been considered. First, among ten classical DE variants, the most effective of them in terms of highest classification accuracy in a ten-fold cross-validation has been found. Secondly, the tool has been compared over the same eight databases against a set of fifteen classifiers widely used in literature. The results have proven the effectiveness of the proposed approach, since DEREx turns out to be the best performing tool in terms of highest classification accuracy. Also statistical analysis has confirmed that DEREx is the best classifier. When compared to the other rule-based classification tools here used, DEREx needs the lowest average number of rules to face a problem, and the average number of clauses per rule is not very high. In conclusion, the tool here presented is preferable to the other classifiers because it shows good classification accuracy, automatically extracts knowledge, and provides users with it under an easily comprehensible form.