Text categorization for streams

  • Authors:
  • D. L. Thomas;W. J. Teahan

  • Affiliations:
  • University Of Wales: Bangor, Bangor, Wales, UK;University Of Wales: Bangor, Bangor, Wales, UK

  • Venue:
  • SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a novel system for evaluating and performing stream-based text categorization. Stream-based text categorization considers the text being categorized as a stream of symbols, which differs from the traditional feature-based approach which relies on extracting features from the text. The system implements character-based languages models--specifically models based on the PPM text compression scheme--as well as count-based measures such as R-Measure and C-Measure. Use of the system demonstrates that all of these techniques outperform SVM, a feature-based classifier, at stream-related classification tasks such as authorship ascription.