Similarity computation of low-frequency Chinese words

Authors:
Xinghua Fan;Ji Chen
Affiliations:
College of Computer Science and Technology, University of Posts and Telecommunications, Chongqing, China;College of Computer Science and Technology, University of Posts and Telecommunications, Chongqing, China
Venue:
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Year:
2009

Citing 3
Cited 0

Similarity-Based Models of Word Cooccurrence Probabilities

Machine Learning - Special issue on natural language learning
Semantic computation in a Chinese question-answering system

Journal of Computer Science and Technology
Word-sense disambiguation using statistical methods

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a novel method on Chinese low-frequency word similarity computation. It adopts a combinational strategy to compute word similarity, which exploits dictionary Hownet and constructed corpus retrieved from Internet. It has 3 steps: (1) If both of two words exist in Hownet, the similarity between them is computed based on Hownet. (2) If either of two words a and b doesn't exist in Hownet, we respectively use word a, word b and word pair a and b as a query to search on the Internet and construct a corpus with the search results. Similarity between two words is computed based on the context of words. (3) In order to guarantee that similarities computed based on different sources are comparable, the similarity computed based on constructed corpus is multiplied by a coefficient. Experimental results show that the proposed method has effectively solved the problem of computing low-frequency word similarity.