Robust and Accurate Cancer Classification with Gene Expression Profiling

Authors:
Haifeng Li;Keshu Zhang;Tao Jiang
Affiliations:
University of California at Riverside;Motorola, Inc.;University of California at Riverside
Venue:
CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
Year:
2005

Citing 12
Cited 3

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Optimal discriminant plane for a small number of samples and design method of classifier on the plane

Pattern Recognition
Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners

IEEE Transactions on Pattern Analysis and Machine Intelligence
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Tissue classification with gene expression profiles

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Class prediction and discovery using gene expression data

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Random Forests

Machine Learning
Solving the Small Sample Size Problem of LDA

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 3 - Volume 3
A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression

Bioinformatics
BagBoosting for tumor classification with gene expression data

Bioinformatics
Eigenfaces for recognition

Journal of Cognitive Neuroscience

A new framework for identifying differentially expressed genes

Pattern Recognition
Fast Kernel Discriminant Analysis for Classification of Liver Cancer Mass Spectra

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Coordinate ascent for penalized semiparametric regression on high-dimensional panel count data

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Robust and accurate cancer classification is critical in cancer treatment. Gene expression profiling is expected to enable us to diagnose tumors precisely and systematically. However, the classification task in this context is very challenging because of the curse of dimensionality and the small sample size problem. In this paper, we propose a novel method to solve these two problems. Our method is able to map gene expression data into a very low dimensional space and thus meets the recommended samples to features per class ratio. As a result, it can be used to classify new samples robustly with low and trustable (estimated) error rates. The method is based on linear discriminant analysis (LDA). However, the conventional LDA requires that the within-class scatter matrix S_w be nonsingular. Unfortunately, Sw is always singular in the case of cancerclassification due to the small sample size problem. To overcome this problem, we develop a generalized linear discriminant analysis (GLDA) that is a general, direct, and complete solution to optimize Fisherýs criterion. GLDA is mathematically well-founded and coincides with the conventional LDA when S_w is nonsingular. Different from the conventional LDA, GLDA does not assume the nonsingularity of S_w, and thus naturally solves the small sample size problem. To accommodate the high dimensionality of scatter matrices, a fast algorithm of GLDA is also developed. Our extensive experiments on seven public cancer datasets show that the method performs well. Especially on some difficult instances that have very small samples to genes per class ratios, our method achieves much higher accuracies than widely used classification methods such as support vector machines, random forests, etc.