Identifying cognates by phonetic and semantic similarity

  • Authors:
  • Grzegorz Kondrak

  • Affiliations:
  • University of Toronto, Toronto, Ontario, Canada

  • Venue:
  • NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

I present a method of identifying cognates in the vocabularies of related languages. I show that a measure of phonetic similarity based on multivalued features performs better than "orthographic" measures, such as the Longest Common Subsequence Ratio (LCSR) or Dice's coefficient. I introduce a procedure for estimating semantic similarity of glosses that employs keyword selection and WordNet. Tests performed on vocabularies of four Algonquian languages indicate that the method is capable of discovering on average nearly 75% percent of cognates at 50% precision.