Medical Datamining with a New Algorithm for Feature Selection and Naive Bayesian Classifier

  • Authors:
  • Ranjit Abraham;Jay B. Simha;S. S. Iyengar

  • Affiliations:
  • -;-;-

  • Venue:
  • ICIT '07 Proceedings of the 10th International Conference on Information Technology
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Much research work in datamining has gone into improving the predictive accuracy of statistical classifiers by applying the techniques of discretization and feature selection. As a probability-based statistical classification method, the Naïve Bayesian classifier has gained wide popularity despite its assumption that attributes are conditionally mutually independent given the class label. In this paper we propose a new feature selection algorithm to improve the classification accuracy of Naïve Bayes with respect to medical datasets. Our experimental results with 17 medical datasets suggest that on an average the new CHI-WSS algorithm gave best results. The proposed algorithm utilizes discretization and simplifies the' wrapper' approach based feature selection by reducing the feature dimensionality through the elimination of irrelevant and least relevant features using chi-square statistics. For our experiments we utilize two established measures to compare the performance of statistical classifiers namely; classification accuracy (or error rate) and the area under ROC to demonstrate that the proposed algorithm using generative Naïve Bayesian classifier on the average is more efficient than using discriminative models namely Logistic Regression and Support Vector Machine.