Quality assurance of the content of a large DL-based terminology using mixed lexical and semantic criteria: experience with SNOMED CT

  • Authors:
  • Alan Rector;Luigi Iannone;Robert Stevens

  • Affiliations:
  • University of Manchester, Manchester, United Kingdom;University of Manchester, Manchester, United Kingdom;University of Manchester, Manchester, United Kingdom

  • Venue:
  • Proceedings of the sixth international conference on Knowledge capture
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

SNOMED-CT is a large medical terminology based on description logic and mandated for use in the US, UK and several other countries. The hierarchies are known to contain many errors, but have so far proved difficult to analyse or quality assure. We present a series of methods and lessons learnt from experience in quality assuring a "module" of SNOMED for specific applications that we expect to generalize both to SNOMED as a whole and to other large ontologies. They feature a) dependence on domain exper-tise b) starting from classes selected for relevance to specific applications, c) tracing all errors to their root and verifying repairs by reclassification d) extraction of manageable-sized "modules"; e) mixed semantic and lexical criteria, and f) extensive use of scripting. They aim to reduce the cognitive load on experts by a) looking initially up-wards rather than downwards in the hierarchies, b) breaking up long lists of direct subclasses by introducing definitions for meaningful subcategories. Errors found range from simple mistakes to systematic errors in schemas.