Revealing phonological similarities between related languages from automatically generated parallel corpora

  • Authors:
  • Karin Müller

  • Affiliations:
  • University of Amsterdam, Amsterdam, The Netherlands

  • Venue:
  • ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present an approach to automatically revealing phonological correspondences within historically related languages. We create two bilingual pronunciation dictionaries for the language pairs German-Dutch and German-English. The data is used for automatically learning phonological similarities between the two language pairs via EM-based clustering. We apply our models to predict from a phonological German word the phonemes of a Dutch and an English cognate. The similarity scores show that German and Dutch phonemes are more similar than German and English phonemes, which supplies statistical evidence of the common knowledge that German is more closely related to Dutch than to English. We assess our approach qualitatively, finding meaningful classes caused by historical sound changes. The classes can be used for language learning.