Creating a test corpus of clinical notes manually tagged for part-of-speech information

  • Authors:
  • Serguei Pakhomov;Anni Coden;Christopher Chute

  • Affiliations:
  • Mayo Clinic, Rochester, MN;IBM, T.J. Watson Research Center, Hawthorne, NY;Mayo Clinic, Rochester, MN

  • Venue:
  • JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a project whose main goal is to construct a corpus of clinical text manually annotated for part-of-speech information. We describe and discuss the process of training three domain experts to perform linguistic annotation. We list some of the challenges as well as encouraging results pertaining to inter-rater agreement and consistency of annotation. We also present preliminary experimental results indicating the necessity for adapting state-of-the-art POS taggers to the sublanguage domain of medical text.