Identifying complex sound correspondences in bilingual wordlists

  • Authors:
  • Grzegorz Kondrak

  • Affiliations:
  • Department of Computing Science, University of Alberta, Edmonton, AB, Canada

  • Venue:
  • CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The determination of recurrent sound correspondences between languages is crucial for the identification of cognates, which are often employed in statistical machine translation for sentence and word alignment. In this paper, an algorithm designed for extracting noncompositional compounds from bitexts is shown to be capable of determining complex sound correspondences in bilingual wordlists. In experimental evaluation, a C++ implementation of the algorithm achieves approximately 90% recall and precision on authentic language data.