Unsupervised joint feature discretization and selection

  • Authors:
  • Artur Ferreira;Mário Figueiredo

  • Affiliations:
  • Instituto Superior de Engenharia de Lisboa, and Instituto de Telecomunicações, Lisboa, Portugal;Instituto Superior Técnico, and Instituto de Telecomunicações, Lisboa, Portugal

  • Venue:
  • IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In many applications, we deal with high dimensional datasets with different types of data. For instance, in text classification and information retrieval problems, we have large collections of documents. Each text is usually represented by a bag-of-words or similar representation, with a large number of features (terms). Many of these features may be irrelevant (or even detrimental) for the learning tasks. This excessive number of features carries the problem of memory usage in order to represent and deal with these collections, clearly showing the need for adequate techniques for feature representation, reduction, and selection, to both improve the classification accuracy and the memory requirements. In this paper, we propose a combined unsupervised feature discretization and feature selection technique. The experimental results on standard datasets show the efficiency of the proposed techniques as well as improvement over previous similar techniques.