Hypothesizing word association from untagged text

  • Authors:
  • Tomoyoshi Matsukawa

  • Affiliations:
  • BBN Systems and Technologies, Cambridge, MA

  • Venue:
  • HLT '93 Proceedings of the workshop on Human Language Technology
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper reports a new method for suggesting word associations, based on a greedy algorithm that employs Chi-square statistics on joint frequencies of pairs of word groups compared against chance co-occurrence. The benefits of this new approach are: 1) we can consider even low frequency words and word pairs, and 2) word groups and word associations can be automatically generated. The method provided 87% accuracy in hypothesizing word associations for unobserved combinations of words in Japanese text.