An investigation into the interaction between feature selection and discretization: learning how and when to read numbers

Authors:
Sumukh Ghodke;Timothy Baldwin
Affiliations:
Department of Computer Science and Software Engineering, University of Melbourne, VIC, Australia;Department of Computer Science and Software Engineering, University of Melbourne, VIC, Australia and NICTA Victoria Laboratories, University of Melbourne, VIC, Australia
Venue:
AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Year:
2007

Citing 4
Cited 0

Instance-Based Learning Algorithms

Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pre-processing is an important part of machine learning, and has been shown to significantly improve the performance of classifiers. In this paper, we take a selection of pre-processing methods--focusing specifically on discretization and feature selection--and empirically examine their combined effect on classifier performance. In our experiments learning algorithms, namely one-R, ID3, naive Bayes, and IB1, and explore the impact of different forms of preprocessing on each combination of dataset and algorithm. We find that in general the combination of wrapper-based forward selection and naive supervised methods of discretization yield consistently above-baseline results.