A Graph Modeling of Semantic Similarity between Words

  • Authors:
  • Marco A. Alvarez;SeungJin Lim

  • Affiliations:
  • Utah State University, USA;Utah State University, USA

  • Venue:
  • ICSC '07 Proceedings of the International Conference on Semantic Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of measuring the semantic similarity between pairs of words has been considered a fundamental operation in data mining and information retrieval. Nevertheless, developing a computational method capable of generating satisfactory results close to what humans would perceive is still a difficult task somewhat owed to the subjective nature of similarity. In this paper, it is presented a novel algorithm for scoring the semantic similarity (SSA) between words. Given two input words w_1 and w_2, SSA exploits their corresponding concepts, relationships, and descriptive glosses available in WordNet in order to build a rooted weighted graph G_sim. The output score is calculated by exploring the concepts present in Gsim and selecting the minimal distance between any two concepts c_1 and c)2 of w_1 and w_2 respectively. The definition of distance is a combination of: 1) the depth of the nearest common ancestor between c_1 and c_2 in G_sim, 2) the intersection of the descriptive glosses of c_1 and c_2, and 3) the shortest distance between c_1 and c_2 in G_sim. A correlation of 0.913 has been achieved between the results by SSA and the human ratings reported by Miller and Charles [15] for a dataset of 28 pairs of nouns. Furthermore, using the full dataset of 65 pairs presented by Rubenstein and Goodenough [20], the correlation between SSA results and the known human ratings is 0.903, which is higher than all other reported algorithms for the same dataset. The high correlations of SSA with human ratings suggest that SSA would be convenient in solving several data mining and information retrieval problems.