Constructing Empirical Formulas for Testing Word Similarity by the Inductive Method of Model Self-Organization

Authors:
Pavel Makagonov;Mikhail Alexandrov
Affiliations:
-;-
Venue:
PorTAL '02 Proceedings of the Third International Conference on Advances in Natural Language Processing
Year:
2002

Citing 4
Cited 1

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Empirical Formula for Testing Word Similarity and Its Application for Constructing a Word Frequency List

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Approach to construction of automatic morphological analysis systems for inflective languages with little effort

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing

Constructing empirical models for automatic dialog parameterization

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identification of words with the same base meaning is a necessary procedure for many algorithms of computational linguistics and text processing. We propose to use for this a knowledge-poor approach using an empirical formula based on the number of the coincident letters in the initial parts of the two words and the number of non-coincident letters in the final parts of these two words. To construct such a formula for a given language, we use inductive method of self-organization developed by A. Ivahnenko. This method considers a set of models (formulas) of a given class and selects the best ones using training samples and test samples. We give a detailed example for English. We also show how to apply the formula for creating word frequency list.