Gene selection in cancer classification using sparse logistic regression with Bayesian regularization

Authors:
Gavin C. Cawley;Nicola L. C. Talbot
Affiliations:
School of Computing Sciences, University of East Anglia Norwich NR4 7TJ, UK;School of Computing Sciences, University of East Anglia Norwich NR4 7TJ, UK
Venue:
Bioinformatics
Year:
2006

Citing 0
Cited 24

On one method of non-diagonal regularization in sparse Bayesian learning

Proceedings of the 24th international conference on Machine learning
Enabling more sophisticated gene expression analysis for understanding diseases and optimizing treatments

ACM SIGKDD Explorations Newsletter - Special issue on data mining for health informatics
The use of logic relationships to model colon cancer gene expression networks with mRNA microarray data

Journal of Biomedical Informatics
Heterogeneous data fusion for alzheimer's disease study

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Biological pathways as features for microarray data classification

Proceedings of the 2nd international workshop on Data and text mining in bioinformatics
A majorization-minimization algorithm for (multiple) hyperparameter learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Feature selection via Boolean independent component analysis

Information Sciences: an International Journal
A method for large-scale l1-regularized logistic regression

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Interval based fuzzy systems for identification of important genes from microarray gene expression data: Application to carcinogenic development

Journal of Biomedical Informatics
Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps

Artificial Intelligence in Medicine
Expectation Propagation for microarray data classification

Pattern Recognition Letters
On the distance concentration awareness of certain data reduction techniques

Pattern Recognition
Recursive Mahalanobis Separability Measure for Gene Subset Selection

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Regularized logistic regression without a penalty term: An application to cancer classification with microarray data

Expert Systems with Applications: An International Journal
Improving accuracy of microarray classification by a simple multi-task feature selection filter

International Journal of Data Mining and Bioinformatics
RFCMAC: A novel reduced localized neuro-fuzzy system approach to knowledge extraction

Expert Systems with Applications: An International Journal
Neuro-fuzzy methodology for selecting genes mediating lung cancer

PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
Sample complexity of linear learning machines with different restrictions over weights

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
Efficient feature selection filters for high-dimensional data

Pattern Recognition Letters
Label-Noise robust logistic regression and its applications

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Predicting Pathologic Complete Response to neoadjuvant chemotherapy in breast cancer using Sparse Logistic Regression

International Journal of Bioinformatics Research and Applications
Logistic regression with weight grouping priors

Computational Statistics & Data Analysis
Selection of genes mediating certain cancers, using a neuro-fuzzy approach

Neurocomputing
A novel forward gene selection algorithm for microarray data

Neurocomputing

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Gene selection algorithms for cancer classification, based on the expression of a small number of biomarker genes, have been the subject of considerable research in recent years. Shevade and Keerthi propose a gene selection algorithm based on sparse logistic regression (SLogReg) incorporating a Laplace prior to promote sparsity in the model parameters, and provide a simple but efficient training procedure. The degree of sparsity obtained is determined by the value of a regularization parameter, which must be carefully tuned in order to optimize performance. This normally involves a model selection stage, based on a computationally intensive search for the minimizer of the cross-validation error. In this paper, we demonstrate that a simple Bayesian approach can be taken to eliminate this regularization parameter entirely, by integrating it out analytically using an uninformative Jeffrey's prior. The improved algorithm (BLogReg) is then typically two or three orders of magnitude faster than the original algorithm, as there is no longer a need for a model selection step. The BLogReg algorithm is also free from selection bias in performance estimation, a common pitfall in the application of machine learning algorithms in cancer classification. Results: The SLogReg, BLogReg and Relevance Vector Machine (RVM) gene selection algorithms are evaluated over the well-studied colon cancer and leukaemia benchmark datasets. The leave-one-out estimates of the probability of test error and cross-entropy of the BLogReg and SLogReg algorithms are very similar, however the BlogReg algorithm is found to be considerably faster than the original SLogReg algorithm. Using nested cross-validation to avoid selection bias, performance estimation for SLogReg on the leukaemia dataset takes almost 48 h, whereas the corresponding result for BLogReg is obtained in only 1 min 24 s, making BLogReg by far the more practical algorithm. BLogReg also demonstrates better estimates of conditional probability than the RVM, which are of great importance in medical applications, with similar computational expense. Availability: A MATLAB implementation of the sparse logistic regression algorithm with Bayesian regularization (BLogReg) is available from http://theoval.cmp.uea.ac.uk/~gcc/cbl/blogreg/ Contact: gcc@cmp.uea.ac.uk