Ensemble logistic regression for feature selection

Authors:
Roman Zakharov;Pierre Dupont
Affiliations:
Machine Learning Group, ICTEAM Institute, Université catholique de Louvain, Louvain-la-Neuve, Belgium;Machine Learning Group, ICTEAM Institute, Université catholique de Louvain, Louvain-la-Neuve, Belgium
Venue:
PRIB'11 Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics
Year:
2011

Citing 12
Cited 0

Random Forests

Machine Learning
Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Inference for the Generalization Error

Machine Learning
Feature selection, L1 vs. L2 regularization, and rotational invariance

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)

Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Stability of feature selection algorithms: a study on high-dimensional spaces

Knowledge and Information Systems
A stability index for feature selection

AIAP'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications
A review of feature selection techniques in bioinformatics

Bioinformatics
Bolasso: model consistent Lasso estimation through the bootstrap

Proceedings of the 25th international conference on Machine learning
Feature Selection by Transfer Learning with Linear Regularized Models

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Robust biomarker identification for cancer diagnosis with ensemble feature selection methods

Bioinformatics
The generalized LASSO

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a novel feature selection algorithm embedded into logistic regression. It specifically addresses high dimensional data with few observations, which are commonly found in the biomedical domain such as microarray data. The overall objective is to optimize the predictive performance of a classifier while favoring also sparse and stable models. Feature relevance is first estimated according to a simple t-test ranking. This initial feature relevance is treated as a feature sampling probability and a multivariate logistic regression is iteratively reestimated on subsets of randomly and non-uniformly sampled features. At each iteration, the feature sampling probability is adapted according to the predictive performance and the weights of the logistic regression. Globally, the proposed selection method can be seen as an ensemble of logistic regression models voting jointly for the final relevance of features. Practical experiments reported on several microarray datasets show that the proposed method offers a comparable or better stability and significantly better predictive performances than logistic regression regularized with Elastic Net. It also outperforms a selection based on Random Forests, another popular embedded feature selection from an ensemble of classifiers.