Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction

Authors:
Nathalie Pochet;Frank De Smet;Johan A. K. Suykens;Bart L. R. De Moor
Affiliations:
ESAT-SCD (SISTA), K.U. Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium;ESAT-SCD (SISTA), K.U. Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium;ESAT-SCD (SISTA), K.U. Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium;ESAT-SCD (SISTA), K.U. Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 33

Regulation probability method for gene selection

Pattern Recognition Letters
Classification of gene-expression data: The manifold-based metric learning way

Pattern Recognition
Extracting gene regulation information for cancer classification

Pattern Recognition
Active learning for microarray data

International Journal of Approximate Reasoning
Constructing the gene regulation-level representation of microarray data for cancer classification

Journal of Biomedical Informatics
Cancer classification using Rotation Forest

Computers in Biology and Medicine
Applying genetic algorithms and support vector machines to the gene selection problem

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - VIII Brazilian Symposium on Neural Networks
Classification of Sporadic and BRCA1 Ovarian Cancer Based on a Genome-Wide Study of Copy Number Variations

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part II
Gene Expression Data Classification Using Independent Variable Group Analysis

ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks, Part II
Systematic benchmarking of microarray data feature extraction and classification

International Journal of Computer Mathematics
Exploration of distributional models for a novel intensity-dependent normalization procedure in censored gene expression data

Computational Statistics & Data Analysis
Ensemble component selection for improving ICA based microarray data prediction models

Pattern Recognition
A GA-Based Approach to ICA Feature Selection: An Efficient Method to Classify Microarray Datasets

ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II
A New Approach to Improving ICA-Based Models for the Classification of Microarray Data

ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part III
A neural network-based biomarker association information extraction approach for cancer classification

Journal of Biomedical Informatics
Microarray data classification based on ensemble independent component selection

Computers in Biology and Medicine
Exploiting scale-free information from expression data for cancer classification

Computational Biology and Chemistry
Gene expression data classification based on non-negative matrix factorization

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Method of regulatory network that can explore protein regulations for disease classification

Artificial Intelligence in Medicine
Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix

Journal of Multivariate Analysis
Detecting influential observations in Kernel PCA

Computational Statistics & Data Analysis
Nonnegative Principal Component Analysis for Cancer Molecular Pattern Discovery

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Gene expression data classification using locally linear discriminant embedding

Computers in Biology and Medicine
Multi-step dimensionality reduction and semi-supervised graph-based tumor classification using gene expression data

Artificial Intelligence in Medicine
Multiclass Kernel-Imbedded Gaussian Processes for Microarray Data Analysis

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A GA based approach to improving the ICA based classification models for tumor classification

WSEAS Transactions on Information Science and Applications
PDA-SVM Hybrid: A Unified Model for Kernel-Based Supervised Classification

Journal of Signal Processing Systems
A novel ensemble algorithm for biomedical classification based on Ant Colony Optimization

Applied Soft Computing
Gene selection based on mutual information for the classification of multi-class cancer

ICIC'06 Proceedings of the 2006 international conference on Computational Intelligence and Bioinformatics - Volume Part III
Penalized independent component discriminant method for tumor classification

ICIC'06 Proceedings of the 2006 international conference on Computational Intelligence and Bioinformatics - Volume Part III
Probabilistic support vector machines for classification of noise affected data

Information Sciences: an International Journal
Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data

Computers in Biology and Medicine
On selecting interacting features from high-dimensional data

Computational Statistics & Data Analysis

Quantified Score

Hi-index	3.85

Visualization

Abstract

Motivation: Microarrays are capable of determining the expression levels of thousands of genes simultaneously. In combination with classification methods, this technology can be useful to support clinical management decisions for individual patients, e.g. in oncology. The aim of this paper is to systematically benchmark the role of non-linear versus linear techniques and dimensionality reduction methods. Results: A systematic benchmarking study is performed by comparing linear versions of standard classification and dimensionality reduction techniques with their non-linear versions based on non-linear kernel functions with a radial basis function (RBF) kernel. A total of 9 binary cancer classification problems, derived from 7 publicly available microarray datasets, and 20 randomizations of each problem are examined. Conclusions: Three main conclusions can be formulated based on the performances on independent test sets. (1) When performing classification with least squares support vector machines (LS-SVMs) (without dimensionality reduction), RBF kernels can be used without risking too much overfitting. The results obtained with well-tuned RBF kernels are never worse and sometimes even statistically significantly better compared to results obtained with a linear kernel in terms of test set receiver operating characteristic and test set accuracy performances. (2) Even for classification with linear classifiers like LS-SVM with linear kernel, using regularization is very important. (3) When performing kernel principal component analysis (kernel PCA) before classification, using an RBF kernel for kernel PCA tends to result in overfitting, especially when using supervised feature selection. It has been observed that an optimal selection of a large number of features is often an indication for overfitting. Kernel PCA with linear kernel gives better results. Availability: Matlab scripts are available on request. Supplementary information: http://www.esat.kuleuven.ac.be/~npochet/Bioinformatics/