Simultaneous cancer classification and gene selection with Bayesian nearest neighbor method: An integrated approach

Authors:
Sounak Chakraborty
Affiliations:
Department of Statistics, University of Missouri-Columbia, 209F Middlebush Hall, Columbia, MO 65211-6100, USA
Venue:
Computational Statistics & Data Analysis
Year:
2009

Citing 9
Cited 6

The nature of statistical learning theory

The nature of statistical learning theory
Discriminant Adaptive Nearest Neighbor Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Random Forests

Machine Learning
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Feature Selection via Concave Minimization and Support Vector Machines

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Regularized simultaneous model selection in multiple quantiles regression

Computational Statistics & Data Analysis
Simultaneous selection of variables and smoothing parameters in structured additive regression models

Computational Statistics & Data Analysis
Monte Carlo Statistical Methods

Monte Carlo Statistical Methods
Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

IEEE Transactions on Pattern Analysis and Machine Intelligence

Multiclass classification and gene selection with a stochastic algorithm

Computational Statistics & Data Analysis
Pattern recognition via projection-based kNN rules

Computational Statistics & Data Analysis
A two step method to identify clinical outcome relevant genes with microarray data

Journal of Biomedical Informatics
A novel hybrid classification model of artificial neural networks and multiple linear regression models

Expert Systems with Applications: An International Journal
Simultaneous sample and gene selection using t-score and approximate support vectors

PRIB'13 Proceedings of the 8th IAPR international conference on Pattern Recognition in Bioinformatics
Identification of glioma cancer-alerted gene markers based on a diagnostic outcome correlation analysis preferential approach

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	0.03

Visualization

Abstract

Since most cancer treatments come with a certain degree of toxicity it is very essential to identify a cancer type correctly and then administer the relevant therapy. With the arrival of powerful tools such as gene expression microarrays the cancer classification basis is slowly changing from morphological properties to molecular signatures. Several recent studies have demonstrated a marked improvement in prediction accuracy of tumor types based on gene expression microarray measurements over clinical markers. The main challenge in working with gene expression microarrays is that there is a huge number of genes to work with. Out of them only a small fraction are actually relevant for differentiating between different types of cancer. A Bayesian nearest neighbor model equipped with an integrated variable selection technique is proposed to overcome this challenge. This classification and gene selection model is able to classify different cancer types accurately and simultaneously identify the relevant or important genes. The proposed model is completely automatic in the sense that it adaptively picks up the neighborhood size and the important covariates. The method is successfully applied to three simulated data sets and four well known real data sets. To demonstrate the competitiveness of the method a comparative study is also done with several other ''off the shelf'' popular classification methods. For all the simulated data sets and real life data sets, the proposed method produced highly competitive if not better results. While the standard approach is two step model building for gene selection and then tumor prediction, this novel adaptive gene selection technique automatically selects the relevant genes along with tumor class prediction in one go. The biological relevance of the selected genes are also discussed to validate the claim.