Hierarchical Bayesian language modelling for the linguistically informed

Authors:
Jan A. Botha
Affiliations:
University of Oxford, UK
Venue:
EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Year:
2012

Citing 12
Cited 0

Empirical methods for compound splitting

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Factored language models and generalized parallel backoff

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A hierarchical Bayesian language model based on Pitman-Yor processes

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Alignment by agreement

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Hierarchical Phrase-Based Translation

Computational Linguistics
A comparison of merging strategies for translation of German compounds

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Using a maximum entropy model to build segmentation lattices for MT

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Towards better machine translation quality for the German--English language pairs

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
cdec: a decoder, alignment, and learning framework for finite-state and context-free translation models

ACLDemos '10 Proceedings of the ACL 2010 System Demonstrations
Hierarchical Pitman-Yor Language Model for Machine Translation

IALP '10 Proceedings of the 2010 International Conference on Asian Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work I address the challenge of augmenting n-gram language models according to prior linguistic intuitions. I argue that the family of hierarchical Pitman-Yor language models is an attractive vehicle through which to address the problem, and demonstrate the approach by proposing a model for German compounds. In an empirical evaluation, the model outperforms the Kneser-Ney model in terms of perplexity, and achieves preliminary improvements in English-German translation.