The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts

  • Authors:
  • György Szarvas;Veronika Vincze;Richárd Farkas;János Csirik

  • Affiliations:
  • University of Szeged, Szeged;University of Szeged, Szeged;Hungarian Academy of Science, Szeged;Hungarian Academy of Science, Szeged

  • Venue:
  • BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article reports on a corpus annotation project that has produced a freely available resource for research on handling negation and uncertainty in biomedical texts (we call this corpus the BioScope corpus). The corpus consists of three parts, namely medical free texts, biological full papers and biological scientific abstracts. The dataset contains annotations at the token level for negative and speculative keywords and at the sentence level for their linguistic scope. The annotation process was carried out by two independent linguist annotators and a chief annotator -- also responsible for setting up the annotation guidelines -- who resolved cases where the annotators disagreed. We will report our statistics on corpus size, ambiguity levels and the consistency of annotations.