Issues on quality assessment of SNOMED CT® subsets: term validation and term extraction

Authors:
Dimitrios Kokkinakis;Ulla Gerdin
Affiliations:
University of Gothenburg, Gothenburg, Sweden;The National Board of Health and Welfare, Stockholm, Sweden
Venue:
WBIE '09 Proceedings of the Workshop on Biomedical Information Extraction
Year:
2009

Citing 7
Cited 0

Rutabaga by any other name: extracting biological names

Journal of Biomedical Informatics - Special issue: Sublanguage
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Term identification in the biomedical literature

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Enhancing automatic term recognition through recognition of variation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Reviewing and Evaluating Automatic Term Recognition Techniques

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Ontology quality assurance through analysis of term transformations

Bioinformatics
The design, implementation, and use of the Ngram statistics package

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The aim of this paper is to apply and develop methods based on Natural Language Processing for automatically testing the validity, reliability and coverage of various Swedish SNOMED-CT subsets, the Systematized NOmenclature of MEDicine - Clinical Terms a multiaxial, hierarchical classification system which is currently being translated from English to Swedish. Our work has been developed across two dimensions. Initially a Swedish electronic text collection of scientific medical documents has been collected and processed to a uniform format. Secondly, a term processing activity has been taken place. In the first phase of this activity, various SNOMED CT subsets have been mapped to the text collection for evaluating the validity and reliability of the translated terms. In parallel, a large number of term candidates have been extracted from the corpus in order to examine the coverage of SNOMED CT. Term candidates that are currently not included in the Swedish SNOMED CT can be either parts of compounds, parts of potential multiword terms, terms that are not yet been translated or potentially new candidates. In order to achieve these goals a number of automatic term recognition algorithms have been applied to the corpus. The results of the later process is to be reviewed by domain experts (relevant to the subsets extracted) through a relevant interface who can decide whether a new set of terms can be incorporated in the Swedish translation of SNOMED CT or not.