Enhancing semantic relation quality of UMLS knowledge sources

  • Authors:
  • Demeke Ayele;Jean-Pierre Chevallet;Getnet Kassie;Million Meshesha

  • Affiliations:
  • Addis Ababa University, Addis Ababa, Ethiopia;University of Grenoble, France Grenoble, France;Addis Ababa University, Addis Ababa, Ethiopia;Addis Ababa University, Addis Ababa, Ethiopia

  • Venue:
  • Proceedings of the International Conference on Management of Emergent Digital EcoSystems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The quality of semantic tuples (semantic triples forming subject-predicate-object) has significant impact in most text mining and knowledge discovery applications. The practical success and usability of these applications momentously depends on the quality of the extracted semantic triples. Most biomedical semantic resources have been developed for different contexts focusing on the structural representation but with less attention on the acceptability and naturalness of the individual semantic triples. In this article, we presented an integrated approach for enhancing the quality of semantic tuples in the UMLS knowledge sources. The approach is based on the integration of three existing auditing techniques: avoiding redundant classifications of semantic concepts, reducing hierarchical and associative relationship inconsistencies. We evaluated the approach based on the number of identified wrongly assigned concepts and inconsistent relationships obtained. The quality of each semantic triple is evaluated based on the acceptability and naturalness of the semantic tuples. The evaluation shows promising results. In the evaluation, we have extracted 10,082 semantic triples randomly from UMLS and obtained 5646 taxonomically and 4436 non-taxonomically related semantic triples. 826 concepts are found redundantly classified and 352 are found hierarchically inconsistent. In non-taxonomic semantic triples, out of 4436, 726 are found to be inconsistent. The quality (acceptability and naturalness) of each semantic triples of the first 100 are also evaluated using domain experts. The Cohen's kappa coefficient is used to measure the degree of agreement between the annotators and the result is promising (0.8).