cdec: a decoder, alignment, and learning framework for finite-state and context-free translation models

Authors:
Chris Dyer;Jonathan Weese;Hendra Setiawan;Adam Lopez;Ferhan Ture;Vladimir Eidelman;Juri Ganitkevitch;Phil Blunsom;Philip Resnik
Affiliations:
University of Maryland;Johns Hopkins University;University of Maryland;University of Edinburgh;University of Maryland;University of Maryland;Johns Hopkins University;Oxford University;University of Maryland
Venue:
ACLDemos '10 Proceedings of the ACL 2010 System Demonstrations
Year:
2010

Citing 19
Cited 32

On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
Weighted deductive parsing and Knuth's algorithm

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Hierarchical Phrase-Based Translation

Computational Linguistics
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Translation as weighted deduction

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Probabilistic inference for machine translation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
11,001 new features for statistical machine translation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Joshua: an open source toolkit for parsing-based machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Better k-best parsing

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Efficient Minimum Error Rate Training and Minimum Bayes-Risk decoding for translation hypergraphs and lattices

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
First- and second-order expectation semirings with applications to minimum-risk training on translation forests

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Two monolingual parses are better than one (synchronous parse)

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Context-free reordering, finite-state translation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

The University of Maryland statistical machine translation system for the Fifth Workshop on Machine Translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Reproducible results in parsing-based machine translation: the JHU shared task submission

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Discriminative word alignment with a function word reordering model

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Unsupervised word alignment with arbitrary features

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Translating from morphologically complex languages: a paraphrase-based approach

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Better hypothesis testing for statistical machine translation: controlling for optimizer instability

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Utilizing target-side semantic role labels to assist hierarchical phrase-based machine translation

SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
CMU system combination in WMT 2011

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
KenLM: faster and smaller language model queries

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
The CMU-ARK German-English translation system

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Noisy SMS machine translation in low-density languages

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Joshua 3.0: syntax-based machine translation with the Thrax grammar extractor

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Multiword expression identification with tree substitution grammars: a parsing tour de force with French

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Heuristic search for non-bottom-up tree structure prediction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning sentential paraphrases from bilingual parallel corpora for text-to-text generation

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Looking inside the box: context-sensitive translation for cross-language information retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Hierarchical Bayesian language modelling for the linguistically informed

EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Encouraging consistent translation choices

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Why not grab a free lunch?: mining large corpora for parallel sentences to improve translation modeling

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
NiuTrans: an open source toolkit for phrase-based and syntax-based machine translation

ACL '12 Proceedings of the ACL 2012 System Demonstrations
Joint feature selection in distributed stochastic learning for large-scale discriminative training in SMT

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Topic models for dynamic translation model adaptation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Heuristic cube pruning in linear time

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
A bayesian model for learning SCFGs with discontiguous rules

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Language model rest costs and space-efficient storage

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Transliteration by sequence labeling with lattice encodings and reranking

NEWS '12 Proceedings of the 4th Named Entity Workshop
The CMU-avenue French-English translation system

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Syntax-aware phrase-based statistical machine translation: system description

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Optimization strategies for online large-margin learning in machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Parsing models for identifying multiword expressions

Computational Linguistics
Using targeted paraphrasing and monolingual crowdsourcing to improve translation

ACM Transactions on Intelligent Systems and Technology (TIST) - Special Sections on Paraphrasing; Intelligent Systems for Socially Aware Computing; Social Computing, Behavioral-Cultural Modeling, and Prediction
Flat vs. hierarchical phrase-based translation models for cross-language information retrieval

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including word-based models, phrase-based models, and models based on synchronous context-free grammars. Using a single unified internal representation for translation forests, the decoder strictly separates model-specific translation logic from general rescoring, pruning, and inference algorithms. From this unified representation, the decoder can extract not only the 1- or k-best translations, but also alignments to a reference, or the quantities necessary to drive discriminative training using gradient-based or gradient-free optimization techniques. Its efficient C++ implementation means that memory use and runtime performance are significantly better than comparable decoders.