Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
MedPost: a part-of-speech tagger for bioMedical text
Bioinformatics
The importance of the lexicon in tagging biological text
Natural Language Engineering
Developing a robust part-of-speech tagger for biomedical text
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
A token centric part-of-speech tagger for biomedical text
AIME'11 Proceedings of the 13th conference on Artificial intelligence in medicine
Hi-index | 0.00 |
Part of Speech (POS) tagging is often a prerequisite for tasks such as partial parsing and information extraction. However, when a POS tagger is simply ported to another domain the tagger's accuracy drops. This problem can be addressed through hand annotation of a corpus in the new domain and supervised training of a new tagger. In our methodology, we use existing raw text and a generic POS annotated corpus to develop taggers for new domains without hand annotation or supervised training. We focus in particular on out-of-vocabulary words since they reduce accuracy (Lease and Charniak. 2005; Smith et al. 2005).