Text genre classification with genre-revealing and subject-revealing features
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Using register-diversified corpora for general language studies
Computational Linguistics - Special issue on using large corpora: II
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Automatic detection of text genre
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Recognizing text genres with simple metrics using discriminant analysis
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Text genre detection using common word frequencies
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Web resources for language modeling in conversational speech recognition
ACM Transactions on Speech and Language Processing (TSLP)
Part-of-speech histograms for genre classification of text
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Automatic genre detection of web documents
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
StringNet as a computational resource for discovering and investigating linguistic constructions
EUCCL '10 Proceedings of the NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics
Open-Set classification for automated genre identification
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
A Computer-Assisted Translation and Writing System
ACM Transactions on Asian Language Information Processing (TALIP)
Hi-index | 0.00 |
This work addresses the problem of genre classification of text and speech transcripts, with the goal of handling genres not seen in training. Two frameworks employing different statistics on word/POS histograms with a PCA transform are examined: a single model for each genre and a factored representation of genre. The impact of the two frameworks on the classification of training-matched and new genres is discussed. Results show that the factored models allow for a finer-grained representation of genre and can more accurately characterize genres not seen in training.