Maximum entropy modeling with feature selection for text categorization

Authors:
Jihong Cai;Fei Song
Affiliations:
Department of Computing and Information Science, University of Guelph, Guelph, Ontario, Canada;Department of Computing and Information Science, University of Guelph, Guelph, Ontario, Canada
Venue:
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Year:
2008

Citing 8
Cited 1

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Machine Learning

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
OCFS: optimal orthogonal centroid feature selection for text categorization

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Comparison of feature selection methods for sentiment analysis

AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Maximum entropy provides a reasonable way of estimating probability distributions and has been widely used for a number of language processing tasks. In this paper, we explore the use of different feature selection methods for text categorization using maximum entropy modeling. We also propose a new feature selection method based on the difference between the relative document frequencies of a feature for both relevant and irrelevant classes. Our experiments on the Reuters RCV1 data set show that our own feature selection performs better than the other feature selection methods and maximum entropy modeling is a competitive method for text categorization.