Classifying factored genres with part-of-speech histograms

  • Authors:
  • S. Feldman;M. Marin;J. Medero;M. Ostendorf

  • Affiliations:
  • University of Washington, Seattle, Washington;University of Washington, Seattle, Washington;University of Washington, Seattle, Washington;University of Washington, Seattle, Washington

  • Venue:
  • NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This work addresses the problem of genre classification of text and speech transcripts, with the goal of handling genres not seen in training. Two frameworks employing different statistics on word/POS histograms with a PCA transform are examined: a single model for each genre and a factored representation of genre. The impact of the two frameworks on the classification of training-matched and new genres is discussed. Results show that the factored models allow for a finer-grained representation of genre and can more accurately characterize genres not seen in training.