Multi-class tumor classification by discriminant partial least squares using microarray gene expression data and assessment of classification models

Authors:
Yongxi Tan;Leming Shi;Weida Tong;G. T. Gene Hwang;Charles Wang
Affiliations:
Burns and Allen Research Institute Microarray Core, Cedars-Sinai Medical Center, David Geffen School of Medicine, UCLA, Los Angeles, CA 90048, USA;Center for Toxicoinformatics, National Center for Toxicological Research, FDA, Jefferson, AR 72079, USA;Center for Toxicoinformatics, National Center for Toxicological Research, FDA, Jefferson, AR 72079, USA;Department of Mathematics, Cornell University, Cornell, NY 14850, USA;Burns and Allen Research Institute Microarray Core, Cedars-Sinai Medical Center, David Geffen School of Medicine, UCLA, Los Angeles, CA 90048, USA
Venue:
Computational Biology and Chemistry
Year:
2004

Citing 1
Cited 11

A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Research Article: Hybrid particle swarm optimization and tabu search approach for selecting genes for tumor classification using gene expression data

Computational Biology and Chemistry
Research Article: Hybrid particle swarm optimization and tabu search approach for selecting genes for tumor classification using gene expression data

Computational Biology and Chemistry
Probability fold change: A robust computational approach for identifying differentially expressed gene lists

Computer Methods and Programs in Biomedicine
New gene selection method for multiclass tumor classification by class centroid

Journal of Biomedical Informatics
Genome-based identification of diagnostic molecular markers for human lung carcinomas by PLS-DA

Computational Biology and Chemistry
Borrowing information from relevant microarray studies for sample classification using weighted partial least squares

Computational Biology and Chemistry
Exploiting scale-free information from expression data for cancer classification

Computational Biology and Chemistry
Improved wavelet neural network for early diagnosis of cancer patients using microarray gene expression data

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
An ensemble classifier based on kernel method for multi-situation DNA microarray data

ICIC'09 Proceedings of the 5th international conference on Emerging intelligent computing technology and applications
DifFUZZY: a fuzzy clustering algorithm for complex datasets

International Journal of Computational Intelligence in Bioinformatics and Systems Biology
Estrogen receptor status prediction by gene component regression: a comparative study

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-throughput DNA microarray provides an effective approach to the monitoring of expression levels of thousands of genes in a sample simultaneously. One promising application of this technology is the molecular diagnostics of cancer, e.g. to distinguish normal tissue from tumor or to classify tumors into different types or subtypes. One problem arising from the use of microarray data is how to analyze the high-dimensional gene expression data, typically with thousands of variables (genes) and much fewer observations (samples). There is a need to develop reliable classification methods to make full use of microarray data and to evaluate accurately the predictive ability and reliability of such derived models. In this paper, discriminant partial least squares was used to classify the different types of human tumors using four microarray datasets and showed good prediction performance. Four different cross-validation procedures (leave-one-out versus leave-half-out; incomplete versus full) were used to evaluate the classification model. Our results indicate that discriminant partial least squares using leave-half-out cross-validation provides a more realistic estimate of the predictive ability of a classification model, which may be overestimated by some of the cross-validation procedures, and the information obtained from different cross-validation procedures can be used to evaluate the reliability of the classification model.