Improved naive bayes for extremely skewed misclassification costs

Authors:
Aleksander Kołcz;Abdur Chowdhury
Affiliations:
AOL, Inc., Dulles, VA;AOL, Inc., Dulles, VA
Venue:
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Year:
2005

Citing 6
Cited 2

Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Adjusted Probability Naive Bayesian Induction

AI '98 Selected papers from the 11th Australian Joint Conference on Artificial Intelligence on Advanced Topics in Artificial Intelligence
The anatomy of a multimodal information filter

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Transductive reliability estimation for medical diagnosis

Artificial Intelligence in Medicine

Towards cost-sensitive learning for real-world applications

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Ensemble of binary learners for reliable text categorization with a reject option

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Naive Bayes has been an effective and important classifier in the text categorization domain despite violations of its underlying assumptions. Although quite accurate, it tends to provide poor estimates of the posterior class probabilities, which hampers its application in the cost-sensitive context. The apparent high confidence with which certain errors are made is particularly problematic when misclassification costs are highly skewed, since conservative setting of the decision threshold may greatly decrease the classifier utility. We propose an extension of the Naive Bayes algorithm aiming to discount the confidence with which errors are made. The approach is based on measuring the amount of change to feature distribution necessary to reverse the initial classifier decision and can be implemented efficiently without over-complicating the process of Naive Bayes induction. In experiments with three benchmark document collections, the decision-reversal Naive Bayes is demonstrated to substantially improve over the popular multinomial version of the Naive Bayes algorithm, in some cases performing more than 40% better.