Language comparison through sparse multilingual word alignment

Authors:
Thomas Mayer;Michael Cysouw
Affiliations:
Quantitative Language Comparison, LMU Munich;Philipp University of Marburg
Venue:
EACL 2012 Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH
Year:
2012

Citing 4
Cited 0

A systematic comparison of various statistical alignment models

Computational Linguistics
A statistical approach to language translation

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Statistical Machine Translation

Statistical Machine Translation
Bitext Alignment

Bitext Alignment

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a novel approach to compare languages on the basis of parallel texts. Instead of using word lists or abstract grammatical characteristics to infer (phylogenetic) relationships, we use multilingual alignments of words in sentences to establish measures of language similarity. To this end, we introduce a new method to quickly infer a multilingual alignment of words, using the co-occurrence of words in a massively parallel text (MPT) to simultaneously align a large number of languages. The idea is that a simultaneous multilingual alignment yields a more adequate clustering of words across different languages than the successive analysis of bilingual alignments. Since the method is computationally demanding for a larger number of languages, we reformulate the problem using sparse matrix calculations. The usefulness of the approach is tested on an MPT that has been extracted from pamphlets of the Jehova's Witnesses. Our preliminary experiments show that this approach can supplement both the historical and the typological comparison of languages.