Robust approach for estimating probabilities in Naïve-Bayes Classifier for gene expression data

  • Authors:
  • B. Chandra;Manish Gupta

  • Affiliations:
  • Department of Mathematics, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi 110 016, India;Institute for Systems Studies and Analyses, Defence R&D Organisation, Metcalfe House, Delhi 110 054, India

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 12.05

Visualization

Abstract

Naive-Bayes Classifier (NBC) is widely used for classification in machine learning. It is considered as the first choice for many classification problems because of its simplicity and classification accuracy as compared to other supervised learning methods. However, for high dimensional data like gene expression data, it does not perform well due to two major limitations i.e. underflow and overfitting. In order to address the problem of underflow, the existing approach adopted is to add the logarithms of probabilities rather than multiplying probabilities and the estimate approach is used for providing solution to overfitting problem. However, in practice for gene expression data, these approaches do not perform well. In this paper, a novel approach has been proposed to overcome the limitations using a robust function for estimating probabilities in Naive-Bayes Classifier. The proposed method not only resolves the limitation of NBC but also improves the classification accuracy for gene expression data. The method has been tested over several benchmark gene expression datasets of high dimension. Comparative results of proposed Robust Naive-Bayes Classifier (R-NBC) and existing NBC for gene expression data have also been illustrated to highlight the effectiveness of the R-NBC. Simulation study has also been performed to depict the robustness of the R-NBC over the existing approaches.