Foundations of statistical natural language processing
Foundations of statistical natural language processing
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Constructing empirical models for automatic dialog parameterization
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Hi-index | 0.00 |
Identification of words with the same base meaning is a necessary procedure for many algorithms of computational linguistics and text processing. We propose to use for this a knowledge-poor approach using an empirical formula based on the number of the coincident letters in the initial parts of the two words and the number of non-coincident letters in the final parts of these two words. To construct such a formula for a given language, we use inductive method of self-organization developed by A. Ivahnenko. This method considers a set of models (formulas) of a given class and selects the best ones using training samples and test samples. We give a detailed example for English. We also show how to apply the formula for creating word frequency list.