Similarity computation of low-frequency Chinese words

  • Authors:
  • Xinghua Fan;Ji Chen

  • Affiliations:
  • College of Computer Science and Technology, University of Posts and Telecommunications, Chongqing, China;College of Computer Science and Technology, University of Posts and Telecommunications, Chongqing, China

  • Venue:
  • FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a novel method on Chinese low-frequency word similarity computation. It adopts a combinational strategy to compute word similarity, which exploits dictionary Hownet and constructed corpus retrieved from Internet. It has 3 steps: (1) If both of two words exist in Hownet, the similarity between them is computed based on Hownet. (2) If either of two words a and b doesn't exist in Hownet, we respectively use word a, word b and word pair a and b as a query to search on the Internet and construct a corpus with the search results. Similarity between two words is computed based on the context of words. (3) In order to guarantee that similarities computed based on different sources are comparable, the similarity computed based on constructed corpus is multiplied by a coefficient. Experimental results show that the proposed method has effectively solved the problem of computing low-frequency word similarity.