Monte Carlo feature selection for supervised classification

Authors:
Michał Dramiński;Alvaro Rada-Iglesias;Stefan Enroth;Claes Wadelius;Jacek Koronacki;Jan Komorowski
Affiliations:
-;-;-;-;-;-
Venue:
Bioinformatics
Year:
2008

Citing 0
Cited 10

Sensitivity and specificity based multiobjective approach for feature selection: Application to cancer diagnosis

Information Processing Letters
Ensemble gene selection for cancer classification

Pattern Recognition
A two step method to identify clinical outcome relevant genes with microarray data

Journal of Biomedical Informatics
Feature selection for support vector machines with RBF kernel

Artificial Intelligence Review
Win percentage: a novel measure for assessing the suitability of machine classifiers for biological problems

Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Multi-Test decision trees for gene expression data analysis

SIIS'11 Proceedings of the 2011 international conference on Security and Intelligent Information Systems
A heuristic biomarker selection approach based on professional tennis player ranking strategy

Computer Methods and Programs in Biomedicine
Using random subspace method for prediction and variable importance assessment in linear regression

Computational Statistics & Data Analysis
Identification of glioma cancer-alerted gene markers based on a diagnostic outcome correlation analysis preferential approach

International Journal of Data Mining and Bioinformatics
Random Reducts: A Monte Carlo Rough Set-based Method for Feature Selection in Large Datasets

Fundamenta Informaticae - To Andrzej Skowron on His 70th Birthday

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Pre-selection of informative features for supervised classification is a crucial, albeit delicate, task. It is desirable that feature selection provides the features that contribute most to the classification task per se and which should therefore be used by any classifier later used to produce classification rules. In this article, a conceptually simple but computer-intensive approach to this task is proposed. The reliability of the approach rests on multiple construction of a tree classifier for many training sets randomly chosen from the original sample set, where samples in each training set consist of only a fraction of all of the observed features. Results: The resulting ranking of features may then be used to advantage for classification via a classifier of any type. The approach was validated using Golub et al. leukemia data and the Alizadeh et al. lymphoma data. Not surprisingly, we obtained a significantly different list of genes. Biological interpretation of the genes selected by our method showed that several of them are involved in precursors to different types of leukemia and lymphoma rather than being genes that are common to several forms of cancers, which is the case for the other methods. Availability: Prototype available upon request. Contact: jan.komorowski@lcb.uu.se