A pilot annotation to investigate discourse connectivity in biomedical text

  • Authors:
  • Hong Yu;Nadya Frid;Susan McRoy;Rashmi Prasad;Alan Lee;Aravind Joshi

  • Affiliations:
  • University of Wisconsin-Milwaukee, Milwaukee, WI;University of Wisconsin-Milwaukee, Milwaukee, WI;University of Wisconsin-Milwaukee, Milwaukee, WI;University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA

  • Venue:
  • BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The goal of the Penn Discourse Treebank (PDTB) project is to develop a large-scale corpus, annotated with coherence relations marked by discourse connectives. Currently, the primary application of the PDTB annotation has been to news articles. In this study, we tested whether the PDTB guidelines can be adapted to a different genre. We annotated discourse connectives and their arguments in one 4,937-token full-text biomedical article. Two linguist annotators showed an agreement of 85% after simple conventions were added. For the remaining 15% cases, we found that biomedical domain-specific knowledge is needed to capture the linguistic cues that can be used to resolve inter-annotator disagreement. We found that the two annotators were able to reach an agreement after discussion. Thus our experiments suggest that the PDTB annotation can be adapted to new domains by minimally adjusting the guidelines and by adding some further domain-specific linguistic cues.