Phrasal equivalence classes for generalized corpus-based machine translation

Authors:
Rashmi Gangadharaiah;Ralf D. Brown;Jaime Carbonell
Affiliations:
Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University
Venue:
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Year:
2011

Citing 14
Cited 0

Learning translation templates from examples

Information Systems - Special issue on selected papers from 6th annual workshop on information technologies and systems, December 1996, Cleveland, Ohio, USA
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Phrase-Based Statistical Machine Translation

KI '02 Proceedings of the 25th Annual German Conference on AI: Advances in Artificial Intelligence
A systematic comparison of various statistical alignment models

Computational Linguistics
Automated generalization of translation examples

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Learning translation templates from bilingual text

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Example-Based Machine Translation in the Pangloss system

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
A syntax-based statistical translation model

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Is it harder to parse Chinese, or the Chinese Treebank?

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A hierarchical phrase-based model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Spectral clustering for example based machine translation

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
From frequency to meaning: vector space models of semantics

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Generalizations of sentence-pairs in Example-based Machine Translation (EBMT) have been shown to increase coverage and translation quality in the past. These template-based approaches (G-EBMT) find common patterns in the bilingual corpus to generate generalized templates. In the past, patterns in the corpus were found by only few of the following ways: finding similar or dissimilar portions of text in groups of sentence-pairs, finding semantically similar words, or use dictionaries and parsers to find syntactic correspondences. This paper combines all the three aspects for generating templates. In this paper, the boundaries for aligning and extracting members (phrase-pairs) for clustering are found using chunkers (hence, syntactic information) trained independently on the two languages under consideration. Then semantically related phrase-pairs are grouped based on the contexts in which they appear. Templates are then constructed by replacing these clustered phrase-pairs by their class labels.We also perform a filtration step by simulating human labelers to obtain only those phrase-pairs that have high correspondences between the source and the target phrases that make up the phrase-pairs. Templates with English-Chinese and English-French language pairs gave significant improvements over a baseline with no templates.