Classifying factored genres with part-of-speech histograms
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
We're not in Kansas anymore: detecting domain changes in streams
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Cross-lingual genre classification
EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Assessment of ESL learners' syntactic competence based on similarity measures
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Recognition of word collocation habits using frequency rank ratio and inter-term intimacy
Expert Systems with Applications: An International Journal
Classifying the socio-situational settings of transcripts of spoken discourses
Speech Communication
Hi-index | 0.00 |
This work addresses the problem of classifying the genre of text, which is useful for a variety of language processing problems. We propose statistics of POS histograms as classification features, coupled with a quadratic discriminant classifier. In experiments on six different text and speech genres, we demonstrate enhanced performance compared to standard techniques using word frequency count features and POS trigram features. Experiments on genres that were not seen in training show intuitive overlaps with the training classes.