Continuous space translation models with neural networks

Authors:
Le Hai Son;Alexandre Allauzen;François Yvon
Affiliations:
Univ. Paris-Sud, France and LIMSI/CNRS, Orsay cedex, France;Univ. Paris-Sud, France and LIMSI/CNRS, Orsay cedex, France;Univ. Paris-Sud, France and LIMSI/CNRS, Orsay cedex, France
Venue:
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Year:
2012

Citing 20
Cited 2

Class-based n-gram models of natural language

Computational Linguistics
Phrase-Based Statistical Machine Translation

KI '02 Proceedings of the 25th Annual German Conference on AI: Advances in Artificial Intelligence
A neural probabilistic language model

The Journal of Machine Learning Research
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Factored language models and generalized parallel backoff

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
Machine Translation with Inferred Stochastic Finite-State Transducers

Computational Linguistics
A hierarchical Bayesian language model based on Pitman-Yor processes

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Continuous space language models

Computer Speech and Language
N-gram-based Machine Translation

Computational Linguistics
Improving statistical MT by coupling reordering and decoding

Machine Translation
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Phrasetable smoothing for statistical machine translation

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
A unigram orientation model for statistical machine translation

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Factored bilingual n-gram language models for statistical machine translation

Machine Translation
Natural Language Processing (Almost) from Scratch

The Journal of Machine Learning Research
Wider context by using bilingual language models in machine translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
LIMSI @ WMT11

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation

Joint WMT 2012 submission of the QUAERO project

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
LIMSI @ WMT'12

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The use of conventional maximum likelihood estimates hinders the performance of existing phrase-based translation models. For lack of sufficient training data, most models only consider a small amount of context. As a partial remedy, we explore here several continuous space translation models, where translation probabilities are estimated using a continuous representation of translation units in lieu of standard discrete representations. In order to handle a large set of translation units, these representations and the associated estimates are jointly computed using a multi-layer neural network with a SOUL architecture. In small scale and large scale English to French experiments, we show that the resulting models can effectively be trained and used on top of a n-gram translation system, delivering significant improvements in performance.