Feature subset selection from positive and unlabelled examples

  • Authors:
  • Borja Calvo;Pedro Larrañaga;Jose A. Lozano

  • Affiliations:
  • Intelligent Systems Group, Department of Computer Science and Artificial Intelligence, University of the Basque Country, Paseo Manuel de Lardizabal, 1, E-20018 Donostia-San Sebastián, Spain;Departamento de Ingeligencia Artificial, Universidad Politécnica de Madrid, E-28660 Boadilla del Monte, Spain;Intelligent Systems Group, Department of Computer Science and Artificial Intelligence, University of the Basque Country, Paseo Manuel de Lardizabal, 1, E-20018 Donostia-San Sebastián, Spain

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2009

Quantified Score

Hi-index 0.10

Visualization

Abstract

The feature subset selection problem has a growing importance in many machine learning applications where the amount of variables is very high. There is a great number of algorithms that can approach this problem in supervised databases but, when examples from one or more classes are not available, supervised feature subset selection algorithms cannot be directly applied. One of these algorithms is the correlation based filter selection (CFS). In this work we propose an adaptation of this algorithm that can be applied when only positive and unlabelled examples are available. As far as we know, this is the first time the feature subset selection problem is studied in the positive unlabelled learning context. We have tested this adaptation on synthetic datasets obtained by sampling Bayesian network models where we know which variables are (in)dependent of the class. We have also tested our adaptations on real-life databases where the absence of negative examples has been simulated. The results show that, having enough positive examples, it is possible to obtain good solutions to the feature subset selection problem when only positive and unlabelled instances are available.