Positioning unknown words in a thesaurus by using information extracted from a corpus

  • Authors:
  • Naohiko Uramoto

  • Affiliations:
  • Tokyo Research Laboratory, Kanagawa-ken, Japan

  • Venue:
  • COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a method for positioning unknown words in an existing thesaurus by using word-to-word relationships with relation (case) markers extracted from a large corpus. A suitable area of the thesaurus for an unknown word is estimated by integrating the human intuition buried in the thesaurus and statistical data extracted from the corpus. To overcome the problem of data sparseness, distinguishing features of each node, called "viewpoints" are extracted automatically and used to calculate the similarity between the unknown word and a word in the thesaurus. The results of an experiment confirm the contribution of viewpoints to the positioning task.