Applying cost sensitive feature selection in an electric database

Authors:
Manuel Mejía-Lavalle
Affiliations:
Instituto de Investigaciones Eléctricas, Gerencia de Sistemas Informáticos, Cuernavaca, Morelos, México
Venue:
ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Year:
2008

Citing 6
Cited 0

Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Data preparation for data mining

Data preparation for data mining
Evaluating Feature Selection Methods for Learning in Data Mining Applications

HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences-Volume 5 - Volume 5
Feature Selection Algorithms: A Survey and Experimental Evaluation

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An introduction to variable and feature selection

The Journal of Machine Learning Research
Ranking a random feature for variable and feature selection

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature selection is a crucial activity when knowledge discovery is applied to large databases, as it reduces dimensionality and therefore the complexity of the problem. Its main objective is to eliminate attributes to obtain a computationally tractable problem, without affecting the solution quality. To perform feature selection, several methods have been proposed, some of them tested over small academic datasets. In this paper we evaluate different feature selection-ranking methods over a large real world database related with a Mexican electric energy client-invoice system. Most of the research on feature selection methods only evaluates accuracy and processing time; here we also report on cost sensitive classification and the amount of discovered knowledge. Additionally, we stress the issue around the boundary that separates relevant and irrelevant features. Finally, we propose a promising feature selection heuristic based on the experiments performed, taken into account a cost sensitive classification.