Using MEDLINE as Standard Corpus for Measuring Semantic Similarity in the Biomedical Domain

  • Authors:
  • Hisham Al-Mubaid;Hoa A. Nguyen

  • Affiliations:
  • University of Houston;University of Houston

  • Venue:
  • BIBE '06 Proceedings of the Sixth IEEE Symposium on BionInformatics and BioEngineering
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Finding the similarity between biomedical terms and concepts is a very important task for biomedical information extraction and knowledge discovery. We propose and investigate the feasibility of using MEDLINE as standard corpus and MeSH ontology for measuring semantic similarity between concepts in the biomedical domain within UMLS framework. We adapted information-based semantic similarity measures from general English and applied them into the biomedical domain to measure the similarity between biomedical terms. The experimental results show that, by using MEDLINE and MeSH ontology, the information-based similarity measures perform very well and produce high correlations with human ratings. The similarity measure of Jiang & Conrath achieved 82% correlation with human similarity scores, and the average correlation with human scores of three measures is approaching 78%. These results confirm that MEDLINE is an effective information source for measuring semantic similarity between biomedical terms and concepts.