Centroid-Based Document Classification: Analysis and Experimental Results
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Inference for the Generalization Error
Machine Learning
Multinomial naive bayes for text categorization revisited
AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Customer targeting models using actively-selected web content
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Feature selection for text classification with Naïve Bayes
Expert Systems with Applications: An International Journal
Sales Intelligence Using Web Mining
ICDM '09 Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects
An effective and robust method for short text classification
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Semantic Space models for classification of consumer webpages on metadata attributes
Journal of Biomedical Informatics
Building a dynamic classifier for large text data collections
ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
Exploiting the systematic review protocol for classification of medical abstracts
Artificial Intelligence in Medicine
Building systematic reviews using automatic text classification techniques
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Smoothing multinomial naïve bayes in the presence of imbalance
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
An improved K-nearest-neighbor algorithm for text categorization
Expert Systems with Applications: An International Journal
Towards real intelligent web exploration
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
A generalized cluster centroid based classifier for text categorization
Information Processing and Management: an International Journal
Hi-index | 0.00 |
Multinomial naive Bayes (MNB) is a popular method for document classification due to its computational efficiency and relatively good predictive performance. It has recently been established that predictive performance can be improved further by appropriate data transformations [1,2]. In this paper we present another transformation that is designed to combat a potential problem with the application of MNB to unbalanced datasets. We propose an appropriate correction by adjusting attribute priors. This correction can be implemented as another data normalization step, and we show that it can significantly improve the area under the ROC curve. We also show that the modified version of MNB is very closely related to the simple centroid-based classifier and compare the two methods empirically.