Biomarker Identification and Cancer Classification Based on Microarray Data Using Laplace Naive Bayes Model with Mean Shrinkage

Authors:
Meng-Yun Wu;Dao-Qing Dai;Yu Shi;Hong Yan;Xiao-Fei Zhang
Affiliations:
Sun Yat-Sen University, Guangzhou;Sun Yat-Sen University, Guangzhou;Zhengzhou Normal University, Zhengzhou;City University of Hong Kong, Hong Kong and University of Sydney, NSW;Sun Yat-Sen University, Guangzhou
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2012

Citing 16
Cited 1

GO: :TermFinder---open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes

Bioinformatics
An empirical comparison of supervised learning algorithms

ICML '06 Proceedings of the 23rd international conference on Machine learning
Penalized Model-Based Clustering with Application to Variable Selection

The Journal of Machine Learning Research
Learning Microarray Gene Expression Data by Hybrid Discriminant Analysis

IEEE MultiMedia
A review of feature selection techniques in bioinformatics

Bioinformatics
Feature Selection with Kernel Class Separability

IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature Selection for Gene Expression Using Model-Based Entropy

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Sparse Support Vector Machines with L_{p} Penalty for Biomarker Identification

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction

Computers in Biology and Medicine
Identification of Full and Partial Class Relevant Genes

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Expectation Propagation for microarray data classification

Pattern Recognition Letters
Nonnegative Principal Component Analysis for Cancer Molecular Pattern Discovery

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Finding Correlated Biclusters from Gene Expression Data

IEEE Transactions on Knowledge and Data Engineering
Recipe for Uncovering Predictive Genes Using Support Vector Machines Based on Model Population Analysis

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Stable Gene Selection from Microarray Data via Sample Weighting

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Protein Complexes Discovery Based on Protein-Protein Interaction Data via a Regularized Sparse Generative Network Model

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Speeding up incremental wrapper feature subset selection with Naive Bayes classifier

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Biomarker identification and cancer classification are two closely related problems. In gene expression data sets, the correlation between genes can be high when they share the same biological pathway. Moreover, the gene expression data sets may contain outliers due to either chemical or electrical reasons. A good gene selection method should take group effects into account and be robust to outliers. In this paper, we propose a Laplace naive Bayes model with mean shrinkage (LNB-MS). The Laplace distribution instead of the normal distribution is used as the conditional distribution of the samples for the reasons that it is less sensitive to outliers and has been applied in many fields. The key technique is the L_1 penalty imposed on the mean of each class to achieve automatic feature selection. The objective function of the proposed model is a piecewise linear function with respect to the mean of each class, of which the optimal value can be evaluated at the breakpoints simply. An efficient algorithm is designed to estimate the parameters in the model. A new strategy that uses the number of selected features to control the regularization parameter is introduced. Experimental results on simulated data sets and 17 publicly available cancer data sets attest to the accuracy, sparsity, efficiency, and robustness of the proposed algorithm. Many biomarkers identified with our method have been verified in biochemical or biomedical research. The analysis of biological and functional correlation of the genes based on Gene Ontology (GO) terms shows that the proposed method guarantees the selection of highly correlated genes simultaneously.