Multinomial naive bayes for text categorization revisited

Authors:
Ashraf M. Kibriya;Eibe Frank;Bernhard Pfahringer;Geoffrey Holmes
Affiliations:
Department of Computer Science, University of Waikato, Hamilton, New Zealand;Department of Computer Science, University of Waikato, Hamilton, New Zealand;Department of Computer Science, University of Waikato, Hamilton, New Zealand;Department of Computer Science, University of Waikato, Hamilton, New Zealand
Venue:
AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Year:
2004

Citing 8
Cited 11

Locally Weighted Learning

Artificial Intelligence Review - Special issue on lazy learning
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Text Categorization Based on Regularized Linear Classification Methods

Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Locally weighted naive bayes

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

A Multiple Instance Learning Strategy for Combating Good Word Attacks on Spam Filters

The Journal of Machine Learning Research
A class-feature-centroid classifier for text categorization

Proceedings of the 18th international conference on World wide web
User-driven development of text mining resources for cancer risk assessment

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
A subspace decision cluster classifier for text classification

Expert Systems with Applications: An International Journal
Naive bayes for text classification with unbalanced classes

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Document representations for classification of short web-page descriptions

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Interactions between document representation and feature selection in text categorization

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Identifying relevant youtube comments to derive socially augmented user models: a semantically enriched machine learning approach

UMAP'11 Proceedings of the 19th international conference on Advances in User Modeling
CatStream: categorising tweets for user profiling and stream filtering

Proceedings of the 2013 international conference on Intelligent user interfaces
Theme word subspace method for text document categorization

DM-IKM '12 Proceedings of the Data Mining and Intelligent Knowledge Management Workshop
A comparison of machine learning algorithms for proactive hard disk drive failure detection

Proceedings of the 4th international ACM Sigsoft symposium on Architecting critical systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents empirical results for several versions of the multinomial naive Bayes classifier on four text categorization problems, and a way of improving it using locally weighted learning More specifically, it compares standard multinomial naive Bayes to the recently proposed transformed weight-normalized complement naive Bayes classifier (TWCNB) [1], and shows that some of the modifications included in TWCNB may not be necessary to achieve optimum performance on some datasets However, it does show that TFIDF conversion and document length normalization are important It also shows that support vector machines can, in fact, sometimes very significantly outperform both methods Finally, it shows how the performance of multinomial naive Bayes can be improved using locally weighted learning However, the overall conclusion of our paper is that support vector machines are still the method of choice if the aim is to maximize accuracy.