Issues on quality assessment of SNOMED CT® subsets: term validation and term extraction

  • Authors:
  • Dimitrios Kokkinakis;Ulla Gerdin

  • Affiliations:
  • University of Gothenburg, Gothenburg, Sweden;The National Board of Health and Welfare, Stockholm, Sweden

  • Venue:
  • WBIE '09 Proceedings of the Workshop on Biomedical Information Extraction
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The aim of this paper is to apply and develop methods based on Natural Language Processing for automatically testing the validity, reliability and coverage of various Swedish SNOMED-CT subsets, the Systematized NOmenclature of MEDicine - Clinical Terms a multiaxial, hierarchical classification system which is currently being translated from English to Swedish. Our work has been developed across two dimensions. Initially a Swedish electronic text collection of scientific medical documents has been collected and processed to a uniform format. Secondly, a term processing activity has been taken place. In the first phase of this activity, various SNOMED CT subsets have been mapped to the text collection for evaluating the validity and reliability of the translated terms. In parallel, a large number of term candidates have been extracted from the corpus in order to examine the coverage of SNOMED CT. Term candidates that are currently not included in the Swedish SNOMED CT can be either parts of compounds, parts of potential multiword terms, terms that are not yet been translated or potentially new candidates. In order to achieve these goals a number of automatic term recognition algorithms have been applied to the corpus. The results of the later process is to be reviewed by domain experts (relevant to the subsets extracted) through a relevant interface who can decide whether a new set of terms can be incorporated in the Swedish translation of SNOMED CT or not.