CDM: an approach to learning in text categorization

  • Authors:
  • J. L. Goldberg

  • Affiliations:
  • -

  • Venue:
  • TAI '95 Proceedings of the Seventh International Conference on Tools with Artificial Intelligence
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

The category discrimination method (CDM) is a new learning algorithm designed for text categorization. The motivation is that there are statistical problems associated with natural language text when it is applied as input to existing machine learning algorithms (too much noise, too many features, skewed distribution). The bases of the CDM are research results about the way that humans learn categories and concepts vis-a-vis contrasting concepts. The essential formula is cue validity borrowed from cognitive psychology, and used to select from all possible single word-based features the 'best' predictors of a given category. The hypothesis that CDM's performance exceeds two non-domain specific algorithms, Bayesian classification and decision tree learners, is empirically tested.