Techniques for improving the performance of naive bayes for text classification

Authors:
Karl-Michael Schneider
Affiliations:
Department of General Linguistics, University of Passau, Passau, Germany
Venue:
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2005

Citing 16
Cited 11

Elements of information theory

Elements of information theory
Towards language independent automated learning of text categorization models

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Learning and Revising User Profiles: The Identification ofInteresting Web Sites

Machine Learning - Special issue on multistrategy learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Context-sensitive learning methods for text categorization

ACM Transactions on Information Systems (TOIS)
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Learning to construct knowledge bases from the World Wide Web

Artificial Intelligence - Special issue on Intelligent internet systems
Machine Learning

Machine Learning
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Data Mining and Knowledge Discovery
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Effective Methods for Improving Naive Bayes Text Classifiers

PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
A divisive information theoretic feature clustering algorithm for text classification

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Distribution of content words and phrases in text and language modelling

Natural Language Engineering

Raising the baseline for high-precision text classifiers

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Text classification: a recent overview

ICCOMP'05 Proceedings of the 9th WSEAS International Conference on Computers
Exploring hedge identification in biomedical literature

Journal of Biomedical Informatics
WORDS AS CLASSIFIERS OF DOCUMENTS ACCORDING TO THEIR HISTORICAL PERIOD AND THE ETHNIC ORIGIN OF THEIR AUTHORS

Cybernetics and Systems
The ineffectiveness of within-document term frequency in text classification

Information Retrieval
Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Knowledge-Based Systems
Use of Medical Subject Headings (MeSH) in Portuguese for categorizing web-based healthcare content

Journal of Biomedical Informatics
A technique for improving the performance of naive bayes text classification

WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Identifying historical period and ethnic origin of documents using stylistic feature sets

DS'06 Proceedings of the 9th international conference on Discovery Science
On text mining algorithms for automated maintenance of hierarchical knowledge directory

KSEM'06 Proceedings of the First international conference on Knowledge Science, Engineering and Management
Control-flow integrity principles, implementations, and applications

ACM Transactions on Information and System Security (TISSEC)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Naive Bayes is often used in text classification applications and experiments because of its simplicity and effectiveness. However, its performance is often degraded because it does not model text well, and by inappropriate feature selection and the lack of reliable confidence scores. We address these problems and show that they can be solved by some simple corrections. We demonstrate that our simple modifications are able to improve the performance of Naive Bayes for text classification significantly.