Local sparsity control for naive Bayes with extreme misclassification costs

Authors:
Aleksander Kolcz
Affiliations:
AOL, Inc., Dulles, VA
Venue:
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Year:
2005

Citing 27
Cited 2

Elements of information theory

Elements of information theory
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Control-Sensitive Feature Selection for Lazy Learners

Artificial Intelligence Review - Special issue on lazy learning
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Robust Classification for Imprecise Environments

Machine Learning
Collection statistics for fast duplicate document detection

ACM Transactions on Information Systems (TOIS)
Probabilistic combination of text classifiers using reliability indicators: models and results

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure

Neural Computation
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Adjusted Probability Naive Bayesian Induction

AI '98 Selected papers from the 11th Australian Joint Conference on Artificial Intelligence on Advanced Topics in Artificial Intelligence
Transforming classifier scores into accurate multiclass probability estimates

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Using asymmetric distributions to improve text classifier probability estimates

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Feature selection using linear classifier weights: interaction with classification models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic score estimation with piecewise logistic regression

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Document preprocessing for naive Bayes classification and clustering with mixture of multinomials

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
Transductive reliability estimation for medical diagnosis

Artificial Intelligence in Medicine

Raising the baseline for high-precision text classifiers

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Towards cost-sensitive learning for real-world applications

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In applications of data mining characterized by highly skewed misclassification costs certain types of errors become virtually unacceptable. This limits the utility of a classifier to a range in which such constraints can be met. Naive Bayes, which has proven to be very useful in text mining applications due to high scalability, can be particularly affected. Although its 0/1 loss tends to be small, its misclassifications are often made with apparently high confidence. Aside from efforts to better calibrate Naive Bayes scores, it has been shown that its accuracy depends on document sparsity and feature selection can lead to marked improvement in classification performance. Traditionally, sparsity is controlled globally, and the result for any particular document may vary. In this work we examine the merits of local sparsity control for Naive Bayes in the context of highly asymmetric misclassification costs. In experiments with three benchmark document collections we demonstrate clear advantages of document-level feature selection. In the extreme cost setting, multinomial Naive Bayes with local sparsity control is able to outperform even some of the recently proposed effective improvements to the Naive Bayes classifier. There are also indications that local feature selection may be preferable in different cost settings.