Feature subset selection from positive and unlabelled examples

Authors:
Borja Calvo;Pedro Larrañaga;Jose A. Lozano
Affiliations:
Intelligent Systems Group, Department of Computer Science and Artificial Intelligence, University of the Basque Country, Paseo Manuel de Lardizabal, 1, E-20018 Donostia-San Sebastián, Spain;Departamento de Ingeligencia Artificial, Universidad Politécnica de Madrid, E-28660 Boadilla del Monte, Spain;Intelligent Systems Group, Department of Computer Science and Artificial Intelligence, University of the Basque Country, Paseo Manuel de Lardizabal, 1, E-20018 Donostia-San Sebastián, Spain
Venue:
Pattern Recognition Letters
Year:
2009

Citing 14
Cited 1

Elements of information theory

Elements of information theory
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems

Theoretical Computer Science
Feature subset selection by Bayesian network-based optimization

Artificial Intelligence
Feature Extraction, Construction and Selection: A Data Mining Perspective

Feature Extraction, Construction and Selection: A Data Mining Perspective
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
An introduction to variable and feature selection

The Journal of Machine Learning Research
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Splice site identification by idlBNs

Bioinformatics
Computational Methods of Feature Selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series)

Computational Methods of Feature Selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series)
A partially supervised classification approach to dominant and recessive human disease gene prediction

Computer Methods and Programs in Biomedicine
Learning Bayesian classifiers from positive and unlabeled examples

Pattern Recognition Letters
A review of feature selection techniques in bioinformatics

Bioinformatics

A pairwise ranking based approach to learning with positive and unlabeled examples

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.10

Visualization

Abstract

The feature subset selection problem has a growing importance in many machine learning applications where the amount of variables is very high. There is a great number of algorithms that can approach this problem in supervised databases but, when examples from one or more classes are not available, supervised feature subset selection algorithms cannot be directly applied. One of these algorithms is the correlation based filter selection (CFS). In this work we propose an adaptation of this algorithm that can be applied when only positive and unlabelled examples are available. As far as we know, this is the first time the feature subset selection problem is studied in the positive unlabelled learning context. We have tested this adaptation on synthetic datasets obtained by sampling Bayesian network models where we know which variables are (in)dependent of the class. We have also tested our adaptations on real-life databases where the absence of negative examples has been simulated. The results show that, having enough positive examples, it is possible to obtain good solutions to the feature subset selection problem when only positive and unlabelled instances are available.