Partial least squares and logistic regression random-effects estimates for gene selection in supervised classification of gene expression data

Authors:
Arief Gusnanto;Alexander Ploner;Farag Shuweihdi;Yudi Pawitan
Affiliations:
-;-;-;-
Venue:
Journal of Biomedical Informatics
Year:
2013

Citing 5
Cited 0

Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
A review of feature selection techniques in bioinformatics

Bioinformatics
Gene selection from microarray data for cancer classification-a machine learning approach

Computational Biology and Chemistry
Pitfalls of supervised feature selection

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Our main interest in supervised classification of gene expression data is to infer whether the expressions can discriminate biological characteristics of samples. With thousands of gene expressions to consider, a gene selection has been advocated to decrease classification by including only the discriminating genes. We propose to make the gene selection based on partial least squares and logistic regression random-effects (RE) estimates before the selected genes are evaluated in classification models. We compare the selection with that based on the two-sample t-statistics, a current practice, and modified t-statistics. The results indicate that gene selection based on logistic regression RE estimates is recommended in a general situation, while the selection based on the PLS estimates is recommended when the number of samples is low. Gene selection based on the modified t-statistics performs well when the genes exhibit moderate-to-high variability with moderate group separation. Respecting the characteristics of the data is a key aspect to consider in gene selection.