Computing word similarity and identifying cognates with pair hidden Markov models

Authors:
Wesley Mackay;Grzegorz Kondrak
Affiliations:
University of Alberta, Edmonton, Alberta, Canada;University of Alberta, Edmonton, Alberta, Canada
Venue:
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Year:
2005

Citing 16
Cited 9

Phonetic string matching: lessons from information retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical methods for speech recognition

Statistical methods for speech recognition
Foundations of statistical natural language processing

Foundations of statistical natural language processing
The String-to-String Correction Problem

Journal of the ACM (JACM)
Approximate String Matching

ACM Computing Surveys (CSUR)
Fuzzy translation of cross-lingual spelling variants

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Bitext maps and alignment via pattern recognition

Computational Linguistics
A new algorithm for the alignment of phonetic sequences

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Alignment of multiple languages for historical comparison

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Linguistic variation and computation

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Combining clues for word alignment

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Multipath translation lexicon induction via bridge languages

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Significance tests for the evaluation of ranking methods

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Identification of confusable drug names: a new approach and evaluation methodology

COLING '04 Proceedings of the 20th international conference on Computational Linguistics

Induction of cross-language affix and letter sequence correspondence

CrossLangInduction '06 Proceedings of the International Workshop on Cross-Language Knowledge Induction
Multiple word alignment with profile hidden Markov models

SRWS '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium
Computing and historical phonology

SigMorPhon '07 Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology
Inducing sound segment differences using Pair Hidden Markov Models

SigMorPhon '07 Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology
Evaluation of several phonetic similarity algorithms on the task of cognate identification

LD '06 Proceedings of the Workshop on Linguistic Distances
Evaluating the pairwise string alignment of pronunciations

LaTeCH-SHELT&R '09 Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education
Transliteration system using pair HMM with weighted FSTs

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Mining transliterations from Wikipedia using pair HMMs

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Similarity patterns in words

EACL 2012 Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a system for computing similarity between pairs of words. Our system is based on Pair Hidden Markov Models, a variation on Hidden Markov Models that has been used successfully for the alignment of biological sequences. The parameters of the model are automatically learned from training data that consists of word pairs known to be similar. Our tests focus on the identification of cognates --- words of common origin in related languages. The results show that our system outperforms previously proposed techniques.