A token centric part-of-speech tagger for biomedical text

  • Authors:
  • Neil Barrett;Jens Weber-Jahnke

  • Affiliations:
  • Department of Computer Science, University of Victoria, Victoria, Canada;Department of Computer Science, University of Victoria, Victoria, Canada

  • Venue:
  • AIME'11 Proceedings of the 13th conference on Artificial intelligence in medicine
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A difficulty with part-of-speech (POS) tagging of biomedical text is accessing and annotating appropriate training corpora. The latter may result in POS taggers trained on corpora that differ from the tagger's target biomedical text. In such cases where training and target corpora differ tagging accuracy decreases. We present a POS tagger that is more accurate than two frequently used biomedical POS taggers (Brill and TnT) when trained on a non-biomedical corpus and evaluated on the MedPost corpus (our tagger: 81.0%, Brill: 77.5%, TnT: 78.2%). Our tagger is also significantly faster than the next best tagger (TnT). It estimates a tag's likelihood for a token by combining prior probabilities (using existing methods) and token probabilities calculated in part using a Naive Bayes classifier. Our results suggest that future work should reexamine POS tagging methods for biomedical text. This differs from the work to date that has focused on retraining existing POS taggers.