Filter approach feature selection methods to support multi-label learning based on relieff and information gain

  • Authors:
  • Newton Spolaôr;Everton Alvares Cherman;Maria Carolina Monard;Huei Diana Lee

  • Affiliations:
  • Laboratory of Computational Intelligence Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, Brazil,Laboratory of Bioinformatics, Western Paraná Stat ...;Laboratory of Computational Intelligence Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, Brazil;Laboratory of Computational Intelligence Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, Brazil;Laboratory of Bioinformatics, Western Paraná State University, Foz do Iguaçu, Brazil

  • Venue:
  • SBIA'12 Proceedings of the 21st Brazilian conference on Advances in Artificial Intelligence
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In multi-label learning, each example in the dataset is associated with a set of labels, and the task of the generated classifier is to predict the label set of unseen examples. Feature selection is an important task in machine learning, which aims to find a small number of features that describes the dataset as well as, or even better, than the original set of features does. This can be achieved by removing irrelevant and/or redundant features according to some importance criterion. Although effective feature selection methods to support classification for single-label data are abound, this is not the case for multi-label data. This work proposes two multi-label feature selection methods which use the filter approach. This approach evaluates statistics of the data independently of any particular classifier. To this end, ReliefF, a single-label feature selection method and an adaptation of the Information Gain measure for multi-label data are used to find the features that should be selected. Both methods were experimentally evaluated in ten benchmark datasets, taking into account the reduction in the number of features as well as the quality of the generated classifiers, showing promising results.