Syntax-based language models for statistical machine translation

Authors:
Daniel Gildea;Matthew John Post
Affiliations:
University of Rochester;University of Rochester
Venue:
Syntax-based language models for statistical machine translation
Year:
2010

Citing 0
Cited 2

Judging grammaticality with tree substitution grammar derivations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Bayesian induction of syntactic language models for brazilian portuguese

PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of machine translation is to develop algorithms that produce human-quality translations of natural language sentences. The evaluation of machine translation quality is split broadly into two aspects: adequacy and fluency. Adequacy measures how faithfully the meaning of the original sentence is preserved, whereas fluency measures whether this meaning is expressed in valid sentences in the target language. While both of these criteria are difficult to meet; fluency is a much more difficult goal. Generally, this likely has something to do with the asymmetrical nature of producing and understanding sentences; although humans are quite robust at inferring the meaning of text even in the presence of lots of noise and error, the rules that govern grammatical utterances are exacting, subtle; and elusive. To produce understandable text, we can rely on this robust processing hardware, but to produce grammatical text, we have to understand how it, works. This dissertation attempts to improve the fluency of machine translation output by explicitly incorporating models of the target language structure into machine translation systems. It is organized into three parts. First, we propose a framework for decoding that decouples the structures of the sentences of the source and target languages, and evaluate it with existing grammatical models as language models for machine translation. Next, we apply lessons from that task to the learning of grammars more suitable to the demands of the machine translation. We then incorporate these grammars, called Tree Substitution Grammars, into our decoding framework.