Towards a better understanding of uncertainties and speculations in Swedish clinical text: analysis of an initial annotation trial

  • Authors:
  • Sumithra Velupillai

  • Affiliations:
  • Stockholm University, Kista, Sweden

  • Venue:
  • NeSp-NLP '10 Proceedings of the Workshop on Negation and Speculation in Natural Language Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Electronic Health Records (EHRs) contain a large amount of free text documentation which is potentially very useful for Information Retrieval and Text Mining applications. We have, in an initial annotation trial, annotated 6 739 sentences randomly extracted from a corpus of Swedish EHRs for sentence level (un)certainty, and token level speculative keywords and negations. This set is split into different clinical practices and analyzed by means of descriptive statistics and pairwise Inter-Annotator Agreement (IAA) measured by F1-score. We identify geriatrics as a clinical practice with a low average amount of uncertain sentences and a high average IAA, and neurology with a high average amount of uncertain sentences. Speculative words are often n-grams, and uncertain sentences longer than average. The results of this analysis is to be used in the creation of a new annotated corpus where we will refine and further develop the initial annotation guidelines and introduce more levels of dimensionality. Once we have finalized our guidelines and refined the annotations we plan to release the corpus for further research, after ensuring that no identifiable information is included.