Automatic text categorization in terms of genre and author
Computational Linguistics
A simple rule-based part of speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Exploring the use of linguistic features in domain and genre classification
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Automatic detection of text genre
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Recognizing text genres with simple metrics using discriminant analysis
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Using non-lexical features to identify effective indexing terms for biomedical illustrations
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
MaTrEx: the DCU MT system for WMT 2009
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
PAISI'10 Proceedings of the 2010 Pacific Asia conference on Intelligence and Security Informatics
Hi-index | 0.00 |
Classification of documents by genre is typically done either using linguistic analysis or term frequency based techniques. The former provides better classification accuracy than the latter but at the cost of two orders of magnitude more computation time. While term frequency analysis requires much less computational resources than linguistic analysis,it returns poor classification accuracy when the genres are not sufficiently distinct. A method that removes or approximates the expensive portions of linguistic analysis is presented.The accuracy and computation time of this method then compared with both linguistic analysis and term frequency analysis. The results in this paper show that this method can significantly reduce the computation of both time of linguistic analysis and term frequency analysis, while retaining an accuracy that is higher than that of term frequency analysis.