Revealing phonological similarities between related languages from automatically generated parallel corpora

Authors:
Karin Müller
Affiliations:
University of Amsterdam, Amsterdam, The Netherlands
Venue:
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Year:
2005

Citing 7
Cited 1

An algorithm to align words for historical comparison

Computational Linguistics
Machine transliteration

Computational Linguistics
A new algorithm for the alignment of phonetic sequences

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Inducing a semantically annotated lexicon via EM-based clustering

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Cross-linguistic phoneme correspondences

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Inducing probabilistic syllable classes using multivariate clustering

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Identifying complex sound correspondences in bilingual wordlists

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing

Induction of cross-language affix and letter sequence correspondence

CrossLangInduction '06 Proceedings of the International Workshop on Cross-Language Knowledge Induction

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present an approach to automatically revealing phonological correspondences within historically related languages. We create two bilingual pronunciation dictionaries for the language pairs German-Dutch and German-English. The data is used for automatically learning phonological similarities between the two language pairs via EM-based clustering. We apply our models to predict from a phonological German word the phonemes of a Dutch and an English cognate. The similarity scores show that German and Dutch phonemes are more similar than German and English phonemes, which supplies statistical evidence of the common knowledge that German is more closely related to Dutch than to English. We assess our approach qualitatively, finding meaningful classes caused by historical sound changes. The classes can be used for language learning.