Towards a better understanding of uncertainties and speculations in Swedish clinical text: analysis of an initial annotation trial

Authors:
Sumithra Velupillai
Affiliations:
Stockholm University, Kista, Sweden
Venue:
NeSp-NLP '10 Proceedings of the Workshop on Negation and Speculation in Natural Language Processing
Year:
2010

Citing 5
Cited 2

Knowtator: a protégé plug-in for annotated corpus construction

NAACL-Demonstrations '06 Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: demonstrations
Learning the scope of hedge cues in biomedical texts

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Exploiting 'subjective' annotations

HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
Detecting speculations and their scopes in scientific text

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Creating and evaluating a consensus for negated and speculative words in a Swedish clinical corpus

NeSp-NLP '10 Proceedings of the Workshop on Negation and Speculation in Natural Language Processing

Creating and evaluating a consensus for negated and speculative words in a Swedish clinical corpus

NeSp-NLP '10 Proceedings of the Workshop on Negation and Speculation in Natural Language Processing
Linking uncertainty in physicians' narratives to diagnostic correctness

ExProM '12 Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Electronic Health Records (EHRs) contain a large amount of free text documentation which is potentially very useful for Information Retrieval and Text Mining applications. We have, in an initial annotation trial, annotated 6 739 sentences randomly extracted from a corpus of Swedish EHRs for sentence level (un)certainty, and token level speculative keywords and negations. This set is split into different clinical practices and analyzed by means of descriptive statistics and pairwise Inter-Annotator Agreement (IAA) measured by F1-score. We identify geriatrics as a clinical practice with a low average amount of uncertain sentences and a high average IAA, and neurology with a high average amount of uncertain sentences. Speculative words are often n-grams, and uncertain sentences longer than average. The results of this analysis is to be used in the creation of a new annotated corpus where we will refine and further develop the initial annotation guidelines and introduce more levels of dimensionality. Once we have finalized our guidelines and refined the annotations we plan to release the corpus for further research, after ensuring that no identifiable information is included.