Rapid adaptation of POS tagging for domain specific uses

Authors:
John E. Miller;Michael Bloodgood;Manabu Torii;K. Vijay-Shanker
Affiliations:
University of Delaware, Newark, DE;University of Delaware, Newark, DE;Georgetown University Medical Center, Washington, DC;University of Delaware, Newark, DE
Venue:
LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
Year:
2006

Citing 4
Cited 0

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Transformation-based learning in the fast lane

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
MedPost: a part-of-speech tagger for bioMedical text

Bioinformatics

Quantified Score

Hi-index	0.01

Visualization

Abstract

Part-of-speech (POS) tagging is a fundamental component for performing natural language tasks such as parsing, information extraction, and question answering. When POS taggers are trained in one domain and applied in significantly different domains, their performance can degrade dramatically. We present a methodology for rapid adaptation of POS taggers to new domains. Our technique is unsupervised in that a manually annotated corpus for the new domain is not necessary. We use suffix information gathered from large amounts of raw text as well as orthographic information to increase the lexical coverage. We present an experiment in the Biological domain where our POS tagger achieves results comparable to POS taggers specifically trained to this domain.