Adaptation of POS tagging for multiple BioMedical domains

Authors:
John E. Miller;Manabu Torii;K. Vijay-Shanker
Affiliations:
University of Delaware, Newark, DE;Georgetown University Medical Center, Washington, DC;University of Delaware, Newark, DE
Venue:
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Year:
2007

Citing 5
Cited 1

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
MedPost: a part-of-speech tagger for bioMedical text

Bioinformatics
The importance of the lexicon in tagging biological text

Natural Language Engineering
Developing a robust part-of-speech tagger for biomedical text

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Parsing biomedical literature

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

A token centric part-of-speech tagger for biomedical text

AIME'11 Proceedings of the 13th conference on Artificial intelligence in medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Part of Speech (POS) tagging is often a prerequisite for tasks such as partial parsing and information extraction. However, when a POS tagger is simply ported to another domain the tagger's accuracy drops. This problem can be addressed through hand annotation of a corpus in the new domain and supervised training of a new tagger. In our methodology, we use existing raw text and a generic POS annotated corpus to develop taggers for new domains without hand annotation or supervised training. We focus in particular on out-of-vocabulary words since they reduce accuracy (Lease and Charniak. 2005; Smith et al. 2005).