Computing semantic relatedness using word frequency and layout information of Wikipedia

  • Authors:
  • Patrick Chan;Yoshinori Hijikata;Shogo Nishida

  • Affiliations:
  • Osaka University, Osaka, Japan;Osaka University, Osaka, Japan;Osaka University, Osaka, Japan

  • Venue:
  • Proceedings of the 28th Annual ACM Symposium on Applied Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Computing the semantic relatedness between two words or phrases is an important problem for fields such as information retrieval and natural language processing. One state-of-the-art approach to solve the problem is Explicit Semantic Analysis (ESA). ESA uses the word frequency in Wikipedia articles to estimate the relevance, so the relevance of words with low frequency cannot always be well estimated. To improve the relevance estimate of the low frequency words, we use not only word frequency but also layout information in Wikipedia articles. Empirical evaluation shows that on the low frequency words, our method achieves better estimate of semantic relatedness over ESA.