A Model-Based Relevance Estimation Approach for Feature Selection in Microarray Datasets

  • Authors:
  • Gianluca Bontempi;Patrick E. Meyer

  • Affiliations:
  • Machine Learning Group, Computer Science Department, ULB, Université Libre de Bruxelles, Brussels, Belgium;Machine Learning Group, Computer Science Department, ULB, Université Libre de Bruxelles, Brussels, Belgium

  • Venue:
  • ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part II
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper presents an original model-based approach for feature selection and its application to classification of microarray datasets. Model-based approaches to feature selection are generally denoted as wrappers. Wrapper methods assess subsets of variables according to their usefulness to a given prediction model which will be eventually used for classification. This strategy assumes that the accuracy of the model used for the wrapper selection is a good estimator of the relevance of the feature subset. We first discuss the limits of this assumption by showing that the assessment of a subset by means of a generic learner (e.g. by cross-validation) returns a biased estimate of the relevance of the subset itself. Secondly, we propose a low-bias estimator of the relevance based on the cross-validation assessment of an unbiased learner. Third, we assess a feature selection approach which combines the low-bias relevance estimator with state-of-the-art relevance estimators in order to enhance their accuracy. The experimental validation on 20 publicly available cancer expression datasets shows the robustness of a selection approach which is not biased by a specific learner.