Applying cost sensitive feature selection in an electric database

  • Authors:
  • Manuel Mejía-Lavalle

  • Affiliations:
  • Instituto de Investigaciones Eléctricas, Gerencia de Sistemas Informáticos, Cuernavaca, Morelos, México

  • Venue:
  • ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Feature selection is a crucial activity when knowledge discovery is applied to large databases, as it reduces dimensionality and therefore the complexity of the problem. Its main objective is to eliminate attributes to obtain a computationally tractable problem, without affecting the solution quality. To perform feature selection, several methods have been proposed, some of them tested over small academic datasets. In this paper we evaluate different feature selection-ranking methods over a large real world database related with a Mexican electric energy client-invoice system. Most of the research on feature selection methods only evaluates accuracy and processing time; here we also report on cost sensitive classification and the amount of discovered knowledge. Additionally, we stress the issue around the boundary that separates relevant and irrelevant features. Finally, we propose a promising feature selection heuristic based on the experiments performed, taken into account a cost sensitive classification.