A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
International Journal of Approximate Reasoning
A Visual Analysis of the Effects of Assumptions of Classical Probabilistic Models
Proceedings of the 2013 Conference on the Theory of Information Retrieval
Hi-index | 0.00 |
Naive-Bayes (NB) classifiers are simple probabilistic classifiers still widely used in supervised learning due to their tradeoff between efficient model training and good empirical results. One of the drawbacks of these classifiers is that in situations of data sparsity (i.e. when the size of training set is small) the maximum likelihood estimation of the probability of unseen features in these situations is equal to zero causing arithmetic anomalies. To prevent this undesirable behavior, a number of smoothing techniques have been proposed. Among these, the Bayesian approach incorporates smoothing in terms of prior knowledge about the parameters of the model usually called hyper-parameters. Our research question is: can a visualization tool help researchers to quickly assess the goodness of the performance of NB classifiers by setting optimal smoothing parameters?