Automatic genre classification by using co-training

  • Authors:
  • Rui Liu;Minghu Jiang;Zheng Tie

  • Affiliations:
  • Lab. of Computational Linguistics, School of Humanities and Social Sciences, Tsinghua University, Beijing, China;Lab. of Computational Linguistics, School of Humanities and Social Sciences, Tsinghua University, Beijing, China;Lab. of Computational Linguistics, School of Humanities and Social Sciences, Tsinghua University, Beijing, China

  • Venue:
  • FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Researchers have concentrated on topic-based text classification while the genre of a document is rarely considered. In this article, we discuss the automatic genre classification and its application. We argue that word level features and sentence level features are two important measures which vary in number among different genres. Word level features include word frequency and POS (Part of Speech) tag statistics. Sentence level features include grammar rules, which have strong relations between different genres. Based on the two aspects of view, we explore a robust approach where the Co-training method is employed to obtain high effectiveness for genre classification.