Data pre-processing: a new algorithm for feature selection and data discretization

Authors:
Marcela X. Ribeiro;Mônica R. P. Ferreira;Caetano Traina, Jr.;Agma J. M. Traina
Affiliations:
University of São Paulo, São Carlos, SP, Brazil;University of São Paulo, São Carlos, SP, Brazil;University of São Paulo, São Carlos, SP, Brazil;University of São Paulo, São Carlos, SP, Brazil
Venue:
CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
Year:
2008

Citing 12
Cited 0

A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
Feature Selection via Discretization

IEEE Transactions on Knowledge and Data Engineering
Dimensionality Reduction in Automatic Knowledge Acquisition: A Simple Greedy Search Approach

IEEE Transactions on Knowledge and Data Engineering
A Discretization Algorithm Based on a Heterogeneity Criterion

IEEE Transactions on Knowledge and Data Engineering
A comparative analysis of discretization methods for Medical Datamining with Naïve Bayesian classifier

ICIT '06 Proceedings of the 9th International Conference on Information Technology
Attractive Feature Reduction Approach for Colon Data Classification

AINAW '07 Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops - Volume 01
A Branch and Bound Algorithm for Feature Subset Selection

IEEE Transactions on Computers
Dimensionality Reduction and Similarity Computation by Inner-Product Approximations

IEEE Transactions on Knowledge and Data Engineering
Dimensionality reduction in high-dimensional space for multimedia information retrieval

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data pre-processing is a key element to improve the accuracy of data mining algorithms. In the pre-processing step, the data are treated in order to make the mining process achievable and effective. Data discretization and feature selection are two important tasks that can be performed prior to the learning phase and can significantly reduce the processing effort of the data mining algorithm. In this paper, we present Omega, a new algorithm for data discretization and feature selection. Omega performs simultaneously data discretization and feature selection. We validated Omega by comparing it with other well-known algorithms for data discretization (1R, ChiMerge and Chi2) and feature selection (DTM, Relief and Chi2). The experiments compared the effects of the pre-processing techniques in the results of the C4.5 algorithm (a well-known decision tree-based classifier). In the results, the data discretization provided by Omega generates the decision tree with one of the smallest average of the number of nodes and the feature selection given by Omega leads to one of the smallest average of error rate. These results indicates that Omega is well-suited to perform both, data discretization and feature selection, being highly appropriate for pre-processing data for data mining tasks.