Parameter determination and feature selection for C4.5 algorithm using scatter search approach

  • Authors:
  • Shih-Wei Lin;Shih-Chieh Chen

  • Affiliations:
  • Chang Gung University, Department of Information Management, Taoyuan, Taiwan;Chang Gung University, Department of Information Management, Taoyuan, Taiwan and National Taiwan University of Science and Technology, Department of Industrial Management, Taipei, Taiwan

  • Venue:
  • Soft Computing - A Fusion of Foundations, Methodologies and Applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The C4.5 decision tree (DT) can be applied in various fields and discovers knowledge for human understanding. However, different problems typically require different parameter settings. Rule of thumb or trial-and-error methods are generally utilized to determine parameter settings. However, these methods may result in poor parameter settings and unsatisfactory results. On the other hand, although a dataset can contain numerous features, not all features are beneficial for classification in C4.5 algorithm. Therefore, a novel scatter search-based approach (SS + DT) is proposed to acquire optimal parameter settings and to select the beneficial subset of features that result in better classification results. To evaluate the efficiency of the proposed SS + DT approach, datasets in the UCI (University of California, Irvine) Machine Learning Repository are utilized to assess the performance of the proposed approach. Experimental results demonstrate that the parameter settings for the C4.5 algorithm obtained by the SS + DT approach are better than those obtained by other approaches. When feature selection is considered, classification accuracy rates on most datasets are increased. Therefore, the proposed approach can be utilized to identify effectively the best parameter settings for C4.5 algorithm and useful features.