Semantic relatedness study using second order co-occurrence vectors computed from biomedical corpora, UMLS and WordNet

  • Authors:
  • Ying Liu;Bridget T. McInnes;Ted Pedersen;Genevieve Melton-Meaux;Serguei Pakhomov

  • Affiliations:
  • University of Minnesota, Minneapolis, MN, USA;University of Minnesota, Minneapolis, MN, USA;University of Minnesota, Duluth, MN, USA;University of Minnesota, Minneapolis, MN, USA;University of Minnesota, Minneapolis, MN, USA

  • Venue:
  • Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automated measures of semantic relatedness are important for effectively processing medical data for a variety of tasks such as information retrieval and natural language processing. In this paper, we present a context vector approach that can compute the semantic relatedness between any pair of concepts in the Unified Medical Language System (UMLS). Our approach has been developed on a corpus of inpatient clinical reports. We use 430 pairs of clinical concepts manually rated for semantic relatedness as the reference standard. The experiments demonstrate that incorporating a combination of the UMLS and WordNet definitions can improve the semantic relatedness. The paper also shows that second order co-occurrence vector measure is a more effective approach than path-based methods for semantic relatedness.