Feature selection for text classification with Naïve Bayes

Authors:
Jingnian Chen;Houkuan Huang;Shengfeng Tian;Youli Qu
Affiliations:
School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China and Department of Information and Computing Science, Shandong University of Finance, Jinan, Shando ...;School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China;School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China;School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
Venue:
Expert Systems with Applications: An International Journal
Year:
2009

Citing 13
Cited 27

An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Feature selection on hierarchy of web documents

Decision Support Systems - Web retrieval and mining
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Some Effective Techniques for Naive Bayes Text Classification

IEEE Transactions on Knowledge and Data Engineering
A novel feature selection algorithm for text categorization

Expert Systems with Applications: An International Journal
Neighbor-weighted K-nearest neighbor for unbalanced text corpus

Expert Systems with Applications: An International Journal
Naive bayes for text classification with unbalanced classes

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Nearest neighbor pattern classification

IEEE Transactions on Information Theory

A sequential feature extraction approach for naïve bayes classification of microarray data

Expert Systems with Applications: An International Journal
Analytical evaluation of term weighting schemes for text categorization

Pattern Recognition Letters
Partition-conditional ICA for Bayesian classification of microarray data

Expert Systems with Applications: An International Journal
A new feature selection algorithm based on binomial hypothesis testing for spam filtering

Knowledge-Based Systems
High Relevance Keyword Extraction facility for Bayesian text classification on different domains of varying characteristic

Expert Systems with Applications: An International Journal
Feature sub-set selection metrics for Arabic text classification

Pattern Recognition Letters
Learning feature-projection based classifiers

Expert Systems with Applications: An International Journal
An enhanced ACO algorithm to select features for text categorization and its parallelization

Expert Systems with Applications: An International Journal
Using the absolute difference of term occurrence probabilities in binary text categorization

Applied Intelligence
Automatic folder allocation system using Bayesian-support vector machines hybrid classification approach

Applied Intelligence
Fast feature selection aimed at high-dimensional data via hybrid-sequential-ranked searches

Expert Systems with Applications: An International Journal
A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization

Information Processing and Management: an International Journal
A comparative study of Naïve Bayes classifier and Bayes net classifier for fault diagnosis of monoblock centrifugal pump using wavelet analysis

Applied Soft Computing
A two-stage feature selection method for text categorization

Computers & Mathematics with Applications
A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine

Expert Systems with Applications: An International Journal
An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization

Applied Intelligence
Hybrid random forests: advantages of mixed trees in classifying text data

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
A global-ranking local feature selection method for text categorization

Expert Systems with Applications: An International Journal
Nonlinear transformation of term frequencies for term weighting in text categorization

Engineering Applications of Artificial Intelligence
A novel probabilistic feature selection method for text classification

Knowledge-Based Systems
Sentiment classification of Chinese online reviews: analysing and improving supervised machine learning

International Journal of Web Engineering and Technology
The decomposed k-nearest neighbor algorithm for imbalanced text classification

FGIT'12 Proceedings of the 4th international conference on Future Generation Information Technology
Class-indexing-based term weighting for automatic text classification

Information Sciences: an International Journal
Automated crime report analysis and classification for e-government and decision support

Proceedings of the 14th Annual International Conference on Digital Government Research
Ant colony based approach to predict stock market movement from mood collected on Twitter

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks

Expert Systems with Applications: An International Journal
Sentiment classification: The contribution of ensemble learning

Decision Support Systems

Quantified Score

Hi-index	12.06

Visualization

Abstract

As an important preprocessing technology in text classification, feature selection can improve the scalability, efficiency and accuracy of a text classifier. In general, a good feature selection method should consider domain and algorithm characteristics. As the Naive Bayesian classifier is very simple and efficient and highly sensitive to feature selection, so the research of feature selection specially for it is significant. This paper presents two feature evaluation metrics for the Naive Bayesian classifier applied on multi-class text datasets: Multi-class Odds Ratio (MOR), and Class Discriminating Measure (CDM). Experiments of text classification with Naive Bayesian classifiers were carried out on two multi-class texts collections. As the results indicate, CDM and MOR gain obviously better selecting effect than other feature selection approaches.