Some Effective Techniques for Naive Bayes Text Classification

Authors:
Sang-Bum Kim;Kyoung-Soo Han;Hae-Chang Rim;Sung Hyon Myaeng
Affiliations:
-;-;-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2006

Citing 14
Cited 22

Representation and learning in information retrieval

Representation and learning in information retrieval
An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Feature Subset Selection in Text-Learning

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
An Empirical Study of Feature Selection for Text Categorization based on Term Weightage

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence

Automatic syllabus classification

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Raising the baseline for high-precision text classifiers

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Similarity computing model of high dimension data for symptom classification of Chinese traditional medicine

Applied Soft Computing
Integrating tags in a semantic content-based recommender

Proceedings of the 2008 ACM conference on Recommender systems
Feature selection for text classification with Naïve Bayes

Expert Systems with Applications: An International Journal
An Empirical Study of Category Skew on Feature Selection for Text Categorization

Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
Simultaneous Product Attribute Name and Value Extraction from Web Pages

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
A Software System for Topic Extraction and Document Classification

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Intelligent steganalytic system: application on natural language environment

WSEAS Transactions on Systems and Control
Improving the performance of Naive Bayes multinomial in e-mail foldering by introducing distribution-based balance of datasets

Expert Systems with Applications: An International Journal
A brief survey on sequence classification

ACM SIGKDD Explorations Newsletter
A granular agent evolutionary algorithm for classification

Applied Soft Computing
A technique for improving the performance of naive bayes text classification

WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Learning feature-projection based classifiers

Expert Systems with Applications: An International Journal
Secure collaboration in global design and supply chain environment: Problem analysis and literature review

Computers in Industry
A folksonomy-based recommender system for personalized access to digital artworks

Journal on Computing and Cultural Heritage (JOCCH)
Machine learning in building a collection of computer science course syllabi

TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
The Effect of Stemming on Arabic Text Classification: An Empirical Study

International Journal of Information Retrieval Research
Categorical proportional difference: a feature selection method for text categorization

AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
Building a search engine for computer science course syllabi

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
A comparative analysis of classification algorithms in data mining for accuracy, speed and robustness

Information Technology and Management
Generalized Dirichlet priors for Naïve Bayesian classifiers with multinomial models in document classification

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.01

Visualization

Abstract

While naive Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. Based on the observation of naive Bayes for the natural language text, we found a serious problem in the parameter estimation process, which causes poor results in text classification domain. In this paper, we propose two empirical heuristics: per-document text normalization and feature weighting method. While these are somewhat ad hoc methods, our proposed naive Bayes text classifier performs very well in the standard benchmark collections, competing with state-of-the-art text classifiers based on a highly complex learning method such as SVM.