A scalable decoder for parsing-based machine translation with equivalent language model state maintenance

Authors:
Zhifei Li;Sanjeev Khudanpur
Affiliations:
Johns Hopkins University, Baltimore, MD;Johns Hopkins University, Baltimore, MD
Venue:
SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
Year:
2008

Citing 10
Cited 15

BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Learning non-isomorphic tree mappings for machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Dependency treelet translation: syntactically informed phrasal SMT

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Tree-to-string alignment template for statistical machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Scalable inference and training of context-rich syntactic translation models

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Hierarchical Phrase-Based Translation

Computational Linguistics
Distributed language modeling for N-best list re-ranking

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Better k-best parsing

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology

Rule filtering by pattern for efficient hierarchical translation

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Hierarchical phrase-based translation with weighted finite state transducers

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Efficient extraction of oracle-best translations from hypergraphs

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Joshua: an open source toolkit for parsing-based machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Arabic cross-document coreference detection

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Demonstration of Joshua: an open source toolkit for parsing-based machine translation

ACLDemos '09 Proceedings of the ACL-IJCNLP 2009 Software Demonstrations
Variational decoding for statistical machine translation

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Better word alignments with supervised ITG models

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Joint parsing and alignment with weakly synchronized grammars

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Model combination for machine translation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Joshua 2.0: a toolkit for parsing-based machine translation with syntax, semirings, discriminative training and other goodies

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Hierarchical phrase-based machine translation with word-based reordering model

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Faster and smaller N-gram language models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Fast inference in phrase extraction models with belief propagation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Language model rest costs and space-efficient storage

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a scalable decoder for parsing-based machine translation. The decoder is written in JAVA and implements all the essential algorithms described in Chiang (2007): chart-parsing, m-gram language model integration, beam- and cube-pruning, and unique k-best extraction. Additionally, parallel and distributed computing techniques are exploited to make it scalable. We also propose an algorithm to maintain equivalent language model states that exploits the back-off property of m-gram language models: instead of maintaining a separate state for each distinguished sequence of "state" words, we merge multiple states that can be made equivalent for language model probability calculations due to back-off. We demonstrate experimentally that our decoder is more than 30 times faster than a baseline decoder written in PYTHON. We propose to release our decoder as an open-source toolkit.