Constructing Empirical Formulas for Testing Word Similarity by the Inductive Method of Model Self-Organization

  • Authors:
  • Pavel Makagonov;Mikhail Alexandrov

  • Affiliations:
  • -;-

  • Venue:
  • PorTAL '02 Proceedings of the Third International Conference on Advances in Natural Language Processing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Identification of words with the same base meaning is a necessary procedure for many algorithms of computational linguistics and text processing. We propose to use for this a knowledge-poor approach using an empirical formula based on the number of the coincident letters in the initial parts of the two words and the number of non-coincident letters in the final parts of these two words. To construct such a formula for a given language, we use inductive method of self-organization developed by A. Ivahnenko. This method considers a set of models (formulas) of a given class and selects the best ones using training samples and test samples. We give a detailed example for English. We also show how to apply the formula for creating word frequency list.