A knowledge-rich approach to measuring the similarity between Bulgarian and Russian words

Authors:
Svetlin Nakov;Elena Paskaleva;Preslav Nakov
Affiliations:
Sofia University "St. Kliment Ohridski", Sofia, Bulgaria;Bulgarian Academy of Sciences, Sofia, Bulgaria;National University of Singapore, Singapore
Venue:
MRTECEEL '09 Proceedings of the Workshop on Multilingual Resources, Technologies and Evaluation for Central and Eastern European Languages
Year:
2009

Citing 8
Cited 0

Computation of Normalized Edit Distance and Applications

IEEE Transactions on Pattern Analysis and Machine Intelligence
Bitext maps and alignment via pattern recognition

Computational Linguistics
A new algorithm for the alignment of phonetic sequences

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Learning a translation lexicon from monolingual corpora

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Identification of confusable drug names: a new approach and evaluation methodology

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Introduction to Information Retrieval

Introduction to Information Retrieval
Methods for extracting and classifying pairs of cognates and false friends

Machine Translation
Identifying complex sound correspondences in bilingual wordlists

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel knowledge-rich approach to measuring the similarity between a pair of words. The algorithm is tailored to Bulgarian and Russian and takes into account the orthographic and the phonetic correspondences between the two Slavic languages: it combines lemmatization, hand-crafted transformation rules, and weighted Levenshtein distance. The experimental results show an 11-pt interpolated average precision of 90.58%, which represents a sizeable improvement over two classic rivaling approaches.