Data pre-processing: a new algorithm for feature selection and data discretization

  • Authors:
  • Marcela X. Ribeiro;Mônica R. P. Ferreira;Caetano Traina, Jr.;Agma J. M. Traina

  • Affiliations:
  • University of São Paulo, São Carlos, SP, Brazil;University of São Paulo, São Carlos, SP, Brazil;University of São Paulo, São Carlos, SP, Brazil;University of São Paulo, São Carlos, SP, Brazil

  • Venue:
  • CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data pre-processing is a key element to improve the accuracy of data mining algorithms. In the pre-processing step, the data are treated in order to make the mining process achievable and effective. Data discretization and feature selection are two important tasks that can be performed prior to the learning phase and can significantly reduce the processing effort of the data mining algorithm. In this paper, we present Omega, a new algorithm for data discretization and feature selection. Omega performs simultaneously data discretization and feature selection. We validated Omega by comparing it with other well-known algorithms for data discretization (1R, ChiMerge and Chi2) and feature selection (DTM, Relief and Chi2). The experiments compared the effects of the pre-processing techniques in the results of the C4.5 algorithm (a well-known decision tree-based classifier). In the results, the data discretization provided by Omega generates the decision tree with one of the smallest average of the number of nodes and the feature selection given by Omega leads to one of the smallest average of error rate. These results indicates that Omega is well-suited to perform both, data discretization and feature selection, being highly appropriate for pre-processing data for data mining tasks.