Logistic regression for disease classification using microarray data

Authors:
J.G. Liao;Khew-Voon Chin
Affiliations:
-;-
Venue:
Bioinformatics
Year:
2007

Citing 0
Cited 19

A New Orthogonal Discriminant Projection Based Prediction Method for Bioinformatic Data

ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Artificial Intelligence
Locally Linear Discriminant Embedding for Tumor Classification

ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Artificial Intelligence
Biological pathways as features for microarray data classification

Proceedings of the 2nd international workshop on Data and text mining in bioinformatics
Large-scale sparse logistic regression

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A neural network-based biomarker association information extraction approach for cancer classification

Journal of Biomedical Informatics
A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression

The Journal of Machine Learning Research
Gene expression data classification using locally linear discriminant embedding

Computers in Biology and Medicine
Regularized logistic regression without a penalty term: An application to cancer classification with microarray data

Expert Systems with Applications: An International Journal
Gene selection and prediction for cancer classification using support vector machines with a reject option

Computational Statistics & Data Analysis
Rough set based maximum relevance-maximum significance criterion and Gene selection from microarray data

International Journal of Approximate Reasoning
Multi-platform gene-expression mining and marker gene analysis

International Journal of Data Mining and Bioinformatics
Pathway-based microarray analysis with negatively correlated feature sets for disease classification

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part I
Gene selection and PSO-BP classifier encoding a prior information

ICSI'10 Proceedings of the First international conference on Advances in Swarm Intelligence - Volume Part II
A pathway-based classification method that can improve microarray-based colorectal cancer diagnosis

ICIC'11 Proceedings of the 7th international conference on Intelligent Computing: bio-inspired computing and applications
Sparse maximum margin discriminant analysis for gene selection

ICIC'11 Proceedings of the 7th international conference on Intelligent Computing: bio-inspired computing and applications
Comparing logistic regression, neural networks, c5.0 and m5′ classification techniques

MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Identification of motor imagery tasks through CC-LR algorithm in brain computer interface

International Journal of Bioinformatics Research and Applications
Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data

Computers in Biology and Medicine
ICP: A novel approach to predict prognosis of prostate cancer with inner-class clustering of gene expression data

Computers in Biology and Medicine

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Logistic regression is a standard method for building prediction models for a binary outcome and has been extended for disease classification with microarray data by many authors. A feature (gene) selection step, however, must be added to penalized logistic modeling due to a large number of genes and a small number of subjects. Model selection for this two-step approach requires new statistical tools because prediction error estimation ignoring the feature selection step can be severely downward biased. Generic methods such as cross-validation and non-parametric bootstrap can be very ineffective due to the big variability in the prediction error estimate. Results: We propose a parametric bootstrap model for more accurate estimation of the prediction error that is tailored to the microarray data by borrowing from the extensive research in identifying differentially expressed genes, especially the local false discovery rate. The proposed method provides guidance on the two critical issues in model selection: the number of genes to include in the model and the optimal shrinkage for the penalized logistic regression. We show that selecting more than 20 genes usually helps little in further reducing the prediction error. Application to Golub's leukemia data and our own cervical cancer data leads to highly accurate prediction models. Availability: R library GeneLogit at http://geocities.com/jg_liao Contact: jl544@drexel.edu