Improved reconstruction of protolanguage word forms

Authors:
Alexandre Bouchard-Côté;Thomas L. Griffiths;Dan Klein
Affiliations:
University of California at Berkeley, Berkeley, CA;University of California at Berkeley, Berkeley, CA;University of California at Berkeley, Berkeley, CA
Venue:
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2009

Citing 7
Cited 6

On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
The reconstruction engine: a computer implementation of the comparative method

Computational Linguistics - Special issue on computational phonology
Algorithms for language reconstruction

Algorithms for language reconstruction
A new algorithm for the alignment of phonetic sequences

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Alignment of multiple languages for historical comparison

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
An application of computer programming to the reconstruction of a proto-language

COLING '69 Proceedings of the 1969 conference on Computational linguistics
Latent-variable modeling of string transductions with finite-state methods

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Graphical models over multiple strings

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Finding cognate groups using phylogenies

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Levenshtein distances fail to identify language relationships accurately

Computational Linguistics
Simple effective decipherment via combinatorial optimization

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Large-scale cognate recovery

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Software helps linguists reconstruct, decipher ancient languages

Communications of the ACM

Quantified Score

Hi-index	0.02

Visualization

Abstract

We present an unsupervised approach to reconstructing ancient word forms. The present work addresses three limitations of previous work. First, previous work focused on faithfulness features, which model changes between successive languages. We add markedness features, which model well-formedness within each language. Second, we introduce universal features, which support generalizations across languages. Finally, we increase the number of languages to which these methods can be applied by an order of magnitude by using improved inference methods. Experiments on the reconstruction of Proto-Oceanic, Proto-Malayo-Javanic, and Classical Latin show substantial reductions in error rate, giving the best results to date.