Identifying cognates by phonetic and semantic similarity

Authors:
Grzegorz Kondrak
Affiliations:
University of Toronto, Toronto, Ontario, Canada
Venue:
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Year:
2001

Citing 6
Cited 14

The reconstruction engine: a computer implementation of the comparative method

Computational Linguistics - Special issue on computational phonology
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
EuroWordNet: a multilingual database with lexical semantic networks

EuroWordNet: a multilingual database with lexical semantic networks
Bitext maps and alignment via pattern recognition

Computational Linguistics
A new algorithm for the alignment of phonetic sequences

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Char_align: a program for aligning parallel texts at the character level

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics

Determining recurrent sound correspondences by inducing translation models

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Semi-supervised learning of partial cognates using bilingual bootstrapping

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Inducing a multilingual dictionary from a parallel multitext in related languages

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Methods for extracting and classifying pairs of cognates and false friends

Machine Translation
Automatic identification of confusable drug names

Artificial Intelligence in Medicine
Identifying complex sound correspondences in bilingual wordlists

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Finding cognate groups using phylogenies

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A statistical model for lost language decipherment

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Unsupervised multilingual learning

Unsupervised multilingual learning
Transliteration equivalence using canonical correlation analysis

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Simple effective decipherment via combinatorial optimization

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Large-scale cognate recovery

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Visualization of linguistic patterns and uncovering language history from multilingual resources

EACL 2012 Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH
Similarity patterns in words

EACL 2012 Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH

Quantified Score

Hi-index	0.00

Visualization

Abstract

I present a method of identifying cognates in the vocabularies of related languages. I show that a measure of phonetic similarity based on multivalued features performs better than "orthographic" measures, such as the Longest Common Subsequence Ratio (LCSR) or Dice's coefficient. I introduce a procedure for estimating semantic similarity of glosses that employs keyword selection and WordNet. Tests performed on vocabularies of four Algonquian languages indicate that the method is capable of discovering on average nearly 75% percent of cognates at 50% precision.