An investigation into the interaction between feature selection and discretization: learning how and when to read numbers

  • Authors:
  • Sumukh Ghodke;Timothy Baldwin

  • Affiliations:
  • Department of Computer Science and Software Engineering, University of Melbourne, VIC, Australia;Department of Computer Science and Software Engineering, University of Melbourne, VIC, Australia and NICTA Victoria Laboratories, University of Melbourne, VIC, Australia

  • Venue:
  • AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Pre-processing is an important part of machine learning, and has been shown to significantly improve the performance of classifiers. In this paper, we take a selection of pre-processing methods--focusing specifically on discretization and feature selection--and empirically examine their combined effect on classifier performance. In our experiments learning algorithms, namely one-R, ID3, naive Bayes, and IB1, and explore the impact of different forms of preprocessing on each combination of dataset and algorithm. We find that in general the combination of wrapper-based forward selection and naive supervised methods of discretization yield consistently above-baseline results.