Introduction to statistical pattern recognition (2nd ed.)
Introduction to statistical pattern recognition (2nd ed.)
Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners
IEEE Transactions on Pattern Analysis and Machine Intelligence
Matrix computations (3rd ed.)
Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection
IEEE Transactions on Pattern Analysis and Machine Intelligence
Tissue classification with gene expression profiles
RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Class prediction and discovery using gene expression data
RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Machine Learning
Solving the Small Sample Size Problem of LDA
ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 3 - Volume 3
Journal of Cognitive Neuroscience
A new framework for identifying differentially expressed genes
Pattern Recognition
Fast Kernel Discriminant Analysis for Classification of Liver Cancer Mass Spectra
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Coordinate ascent for penalized semiparametric regression on high-dimensional panel count data
Computational Statistics & Data Analysis
Hi-index | 0.00 |
Robust and accurate cancer classification is critical in cancer treatment. Gene expression profiling is expected to enable us to diagnose tumors precisely and systematically. However, the classification task in this context is very challenging because of the curse of dimensionality and the small sample size problem. In this paper, we propose a novel method to solve these two problems. Our method is able to map gene expression data into a very low dimensional space and thus meets the recommended samples to features per class ratio. As a result, it can be used to classify new samples robustly with low and trustable (estimated) error rates. The method is based on linear discriminant analysis (LDA). However, the conventional LDA requires that the within-class scatter matrix S_w be nonsingular. Unfortunately, Sw is always singular in the case of cancerclassification due to the small sample size problem. To overcome this problem, we develop a generalized linear discriminant analysis (GLDA) that is a general, direct, and complete solution to optimize Fisherýs criterion. GLDA is mathematically well-founded and coincides with the conventional LDA when S_w is nonsingular. Different from the conventional LDA, GLDA does not assume the nonsingularity of S_w, and thus naturally solves the small sample size problem. To accommodate the high dimensionality of scatter matrices, a fast algorithm of GLDA is also developed. Our extensive experiments on seven public cancer datasets show that the method performs well. Especially on some difficult instances that have very small samples to genes per class ratios, our method achieves much higher accuracies than widely used classification methods such as support vector machines, random forests, etc.