Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus

  • Authors:
  • Mark Stevenson;Yikun Guo

  • Affiliations:
  • Natural Language Processing Group, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, United Kingdom;Natural Language Processing Group, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, United Kingdom

  • Venue:
  • Journal of Biomedical Informatics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Researchers have access to a vast amount of information stored in textual documents and there is a pressing need for the development of automated methods to enable and improve access to this resource. Lexical ambiguity, the phenomena in which a word or phrase has more than one possible meaning, presents a significant obstacle to automated text processing. Word Sense Disambiguation (WSD) is a technology that resolves these ambiguities automatically and is an important stage in text understanding. The most accurate approaches to WSD rely on manually labeled examples but this is usually not available and is prohibitively expensive to create. This paper offers a solution to that problem by using information in the UMLS Metathesaurus to automatically generate labeled examples. Two approaches are presented. The first is an extension of existing work (Liu et al., 2002 [1]) and the second a novel approach that exploits information in the UMLS that has not been used for this purpose. The automatically generated examples are evaluated by comparing them against the manually labeled ones in the NLM-WSD data set and are found to outperform the baseline. The examples generated using the novel approach produce an improvement in WSD performance when combined with manually labeled examples.