An empirical comparison of supervised learning algorithms
ICML '06 Proceedings of the 23rd international conference on Machine learning
Penalized Model-Based Clustering with Application to Variable Selection
The Journal of Machine Learning Research
A review of feature selection techniques in bioinformatics
Bioinformatics
Feature Selection with Kernel Class Separability
IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature Selection for Gene Expression Using Model-Based Entropy
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Sparse Support Vector Machines with L_{p} Penalty for Biomarker Identification
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Computers in Biology and Medicine
Identification of Full and Partial Class Relevant Genes
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Expectation Propagation for microarray data classification
Pattern Recognition Letters
Nonnegative Principal Component Analysis for Cancer Molecular Pattern Discovery
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Finding Correlated Biclusters from Gene Expression Data
IEEE Transactions on Knowledge and Data Engineering
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Stable Gene Selection from Microarray Data via Sample Weighting
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Speeding up incremental wrapper feature subset selection with Naive Bayes classifier
Knowledge-Based Systems
Hi-index | 0.00 |
Biomarker identification and cancer classification are two closely related problems. In gene expression data sets, the correlation between genes can be high when they share the same biological pathway. Moreover, the gene expression data sets may contain outliers due to either chemical or electrical reasons. A good gene selection method should take group effects into account and be robust to outliers. In this paper, we propose a Laplace naive Bayes model with mean shrinkage (LNB-MS). The Laplace distribution instead of the normal distribution is used as the conditional distribution of the samples for the reasons that it is less sensitive to outliers and has been applied in many fields. The key technique is the L_1 penalty imposed on the mean of each class to achieve automatic feature selection. The objective function of the proposed model is a piecewise linear function with respect to the mean of each class, of which the optimal value can be evaluated at the breakpoints simply. An efficient algorithm is designed to estimate the parameters in the model. A new strategy that uses the number of selected features to control the regularization parameter is introduced. Experimental results on simulated data sets and 17 publicly available cancer data sets attest to the accuracy, sparsity, efficiency, and robustness of the proposed algorithm. Many biomarkers identified with our method have been verified in biochemical or biomedical research. The analysis of biological and functional correlation of the genes based on Gene Ontology (GO) terms shows that the proposed method guarantees the selection of highly correlated genes simultaneously.