Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Improving accuracy in word class tagging through the combination of machine learning systems
Computational Linguistics
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Noun phrase recognition by system combination
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A simple rule-based part of speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Classifier combination for improved lexical disambiguation
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Improving data driven wordclass tagging by system combination
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Hi-index | 0.00 |
Part-of-speech (POS) tagging is the foundation ofnatural language processing (NLP) systems, and thus hasbeen an active area of research for many years. However,one question remains unanswered: How will a POStagger behave when the input text is not error-free? Thisissue can be of great importance when the text comesfrom imperfect sources like Optical CharacterRecognition (OCR). This paper analyzes the performanceof both individual POS taggers and combination systemson imperfect text. Experimental results show that a POStagger's accuracy will decrease linearly with thecharacter error rate and the slope indicates a tagger'ssensitivity to input text errors.