A Model-Based Relevance Estimation Approach for Feature Selection in Microarray Datasets

Authors:
Gianluca Bontempi;Patrick E. Meyer
Affiliations:
Machine Learning Group, Computer Science Department, ULB, Université Libre de Bruxelles, Brussels, Belgium;Machine Learning Group, Computer Science Department, ULB, Université Libre de Bruxelles, Brussels, Belgium
Venue:
ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part II
Year:
2008

Citing 10
Cited 0

Elements of information theory

Elements of information theory
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Lazy learning meets the recursive least squares algorithm

Proceedings of the 1998 conference on Advances in neural information processing systems II
A Formalism for Relevance and Its Application in Feature Subset Selection

Machine Learning
An introduction to variable and feature selection

The Journal of Machine Learning Research
Efficient Feature Selection via Analysis of Relevance and Redundancy

The Journal of Machine Learning Research
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression

Bioinformatics
A review of feature selection techniques in bioinformatics

Bioinformatics
On the use of variable complementarity for feature selection in cancer classification

EuroGP'06 Proceedings of the 2006 international conference on Applications of Evolutionary Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper presents an original model-based approach for feature selection and its application to classification of microarray datasets. Model-based approaches to feature selection are generally denoted as wrappers. Wrapper methods assess subsets of variables according to their usefulness to a given prediction model which will be eventually used for classification. This strategy assumes that the accuracy of the model used for the wrapper selection is a good estimator of the relevance of the feature subset. We first discuss the limits of this assumption by showing that the assessment of a subset by means of a generic learner (e.g. by cross-validation) returns a biased estimate of the relevance of the subset itself. Secondly, we propose a low-bias estimator of the relevance based on the cross-validation assessment of an unbiased learner. Third, we assess a feature selection approach which combines the low-bias relevance estimator with state-of-the-art relevance estimators in order to enhance their accuracy. The experimental validation on 20 publicly available cancer expression datasets shows the robustness of a selection approach which is not biased by a specific learner.