Maximum entropy modeling with feature selection for text categorization

  • Authors:
  • Jihong Cai;Fei Song

  • Affiliations:
  • Department of Computing and Information Science, University of Guelph, Guelph, Ontario, Canada;Department of Computing and Information Science, University of Guelph, Guelph, Ontario, Canada

  • Venue:
  • AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Maximum entropy provides a reasonable way of estimating probability distributions and has been widely used for a number of language processing tasks. In this paper, we explore the use of different feature selection methods for text categorization using maximum entropy modeling. We also propose a new feature selection method based on the difference between the relative document frequencies of a feature for both relevant and irrelevant classes. Our experiments on the Reuters RCV1 data set show that our own feature selection performs better than the other feature selection methods and maximum entropy modeling is a competitive method for text categorization.