Computing Knowledge-Based Semantic Similarity from the Web: An Application to the Biomedical Domain

  • Authors:
  • David Sánchez;Montserrat Batet;Aida Valls

  • Affiliations:
  • Department of Computer Science and Mathematics, University Rovira i Virgili, Tarragona 43007;Department of Computer Science and Mathematics, University Rovira i Virgili, Tarragona 43007;Department of Computer Science and Mathematics, University Rovira i Virgili, Tarragona 43007

  • Venue:
  • KSEM '09 Proceedings of the 3rd International Conference on Knowledge Science, Engineering and Management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Computation of semantic similarity between concepts is a very common problem in many language related tasks and knowledge domains. In the biomedical field, several approaches have been developed to deal with this issue by exploiting the knowledge available in domain ontologies (SNOMED-CT) and specific, closed and reliable corpuses (clinical data). However, in recent years, the enormous growth of the Web has motivated researchers to start using it as the base corpus to assist semantic analysis of language. This paper proposes and evaluates the use of the Web as background corpus for measuring the similarity of biomedical concepts. Several classical similarity measures have been considered and tested, using a benchmark composed by biomedical terms and comparing the results against approaches in which specific clinical data were used. Results shows that the similarity values obtained from the Web are even more reliable than those obtained from specific clinical data, manifesting the suitability of the Web as an information corpus for the biomedical domain.