A visual tool for bayesian data analysis: the impact of smoothing on naive bayes text classifiers

  • Authors:
  • Giorgio Maria Di Nunzio;Alessandro Sordoni

  • Affiliations:
  • University of Padua, Padua, Italy;Université de Montréal, Montréal, Canada

  • Venue:
  • SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Naive-Bayes (NB) classifiers are simple probabilistic classifiers still widely used in supervised learning due to their tradeoff between efficient model training and good empirical results. One of the drawbacks of these classifiers is that in situations of data sparsity (i.e. when the size of training set is small) the maximum likelihood estimation of the probability of unseen features in these situations is equal to zero causing arithmetic anomalies. To prevent this undesirable behavior, a number of smoothing techniques have been proposed. Among these, the Bayesian approach incorporates smoothing in terms of prior knowledge about the parameters of the model usually called hyper-parameters. Our research question is: can a visualization tool help researchers to quickly assess the goodness of the performance of NB classifiers by setting optimal smoothing parameters?