A probabilistic model for text categorization: based on a single random variable with multiple values

  • Authors:
  • Makoto Iwayama;Takenobu Tokunaga

  • Affiliations:
  • Advanced Research Laboratory, Hitachi Ltd., Hatoyama, Saitama, Japan;Tokyo Institute of Technology, Ôokayama, Meguro-Ku, Tokyo, Japan

  • Venue:
  • ANLC '94 Proceedings of the fourth conference on Applied natural language processing
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text categorization is the classification of documents with respect to a set of predefined categories. In this paper, we propose a new probabilistic model for text categorization, that is based on a Single random Variable with Multiple Values (SVMV). Compared to previous probabilistic models, our model has the following advantages; 1) it considers within-document term frequencies, 2) considers term weighting for target documents, and 3) is less affected by having insufficient training cases. We verify our model's superiority over the others in the task of categorizing news articles from the "Wall Street Journal".