A visual tool for bayesian data analysis: the impact of smoothing on naive bayes text classifiers

Authors:
Giorgio Maria Di Nunzio;Alessandro Sordoni
Affiliations:
University of Padua, Padua, Italy;Université de Montréal, Montréal, Canada
Venue:
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Year:
2012

Citing 2
Cited 1

A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Using scatterplots to understand and improve probabilistic models for text categorization and retrieval

International Journal of Approximate Reasoning

A Visual Analysis of the Effects of Assumptions of Classical Probabilistic Models

Proceedings of the 2013 Conference on the Theory of Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Naive-Bayes (NB) classifiers are simple probabilistic classifiers still widely used in supervised learning due to their tradeoff between efficient model training and good empirical results. One of the drawbacks of these classifiers is that in situations of data sparsity (i.e. when the size of training set is small) the maximum likelihood estimation of the probability of unseen features in these situations is equal to zero causing arithmetic anomalies. To prevent this undesirable behavior, a number of smoothing techniques have been proposed. Among these, the Bayesian approach incorporates smoothing in terms of prior knowledge about the parameters of the model usually called hyper-parameters. Our research question is: can a visualization tool help researchers to quickly assess the goodness of the performance of NB classifiers by setting optimal smoothing parameters?