Quality assurance of the content of a large DL-based terminology using mixed lexical and semantic criteria: experience with SNOMED CT

Authors:
Alan Rector;Luigi Iannone;Robert Stevens
Affiliations:
University of Manchester, Manchester, United Kingdom;University of Manchester, Manchester, United Kingdom;University of Manchester, Manchester, United Kingdom
Venue:
Proceedings of the sixth international conference on Knowledge capture
Year:
2011

Citing 6
Cited 1

Modularisation of domain ontologies implemented in description logics and related formalisms including OWL

Proceedings of the 2nd international conference on Knowledge capture
Investigating subsumption in SNOMED CT: An exploration into large description logic-based biomedical terminologies

Artificial Intelligence in Medicine
Embedding Knowledge Patterns into OWL

ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
Modular reuse of ontologies: theory and practice

Journal of Artificial Intelligence Research
Pushing the EL envelope

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Justification oriented proofs in OWL

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I

Engineering use cases for modular development of ontologies in OWL

Applied Ontology - Modularity in Ontologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

SNOMED-CT is a large medical terminology based on description logic and mandated for use in the US, UK and several other countries. The hierarchies are known to contain many errors, but have so far proved difficult to analyse or quality assure. We present a series of methods and lessons learnt from experience in quality assuring a "module" of SNOMED for specific applications that we expect to generalize both to SNOMED as a whole and to other large ontologies. They feature a) dependence on domain exper-tise b) starting from classes selected for relevance to specific applications, c) tracing all errors to their root and verifying repairs by reclassification d) extraction of manageable-sized "modules"; e) mixed semantic and lexical criteria, and f) extensive use of scripting. They aim to reduce the cognitive load on experts by a) looking initially up-wards rather than downwards in the hierarchies, b) breaking up long lists of direct subclasses by introducing definitions for meaningful subcategories. Errors found range from simple mistakes to systematic errors in schemas.