Filtering antonymous, trend-contrasting, and polarity-dissimilar distributional paraphrases for improving statistical machine translation

Authors:
Yuval Marton;Ahmed El Kholy;Nizar Habash
Affiliations:
T.J. Watson Research Center, IBM;Columbia University;Columbia University
Venue:
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Year:
2011

Citing 30
Cited 2

Extracting paraphrases from a parallel corpus

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Characterising measures of lexical distributional similarity

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Making computers laugh: investigations in automatic humor recognition

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Improved statistical machine translation using paraphrases

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Corpus-based comprehensive and diagnostic MT evaluation: initial Arabic, Chinese, French, and Spanish results

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Opinion Mining and Sentiment Analysis

Foundations and Trends in Information Retrieval
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Negation, contrast and contradiction in text processing

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
A uniform approach to analogies, synonyms, antonyms, and associations

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Syntactic constraints on paraphrases extracted from parallel corpora

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning with compositional semantics as structural inference for subsentential sentiment analysis

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Computing word-pair antonymy

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Improving Arabic-Chinese statistical machine translation using English as pivot language

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Identifying synonyms among distributionally similar words

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Introduction of a new paraphrase generation tool based on Monte-Carlo sampling

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Source-language entailment modeling for translating unknown terms

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Application-driven statistical paraphrase generation

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Improved statistical machine translation using monolingually-derived paraphrases

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Sub-sentential paraphrasing by contextual pivot translation

TextInfer '09 Proceedings of the 2009 Workshop on Applied Textual Inference
A Bayesian method for robust estimation of distributional similarities

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Paraphrase lattice for statistical machine translation

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
From frequency to meaning: vector space models of semantics

Journal of Artificial Intelligence Research
Facilitating translation using source language paraphrase lattices

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Generating phrasal and sentential paraphrases: A survey of data-driven methods

Computational Linguistics
Orthographic and morphological processing for English---Arabic statistical machine translation

Machine Translation

Enlarging paraphrase collections through generalization and instantiation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Distributional phrasal paraphrase generation for statistical machine translation

ACM Transactions on Intelligent Systems and Technology (TIST) - Special Sections on Paraphrasing; Intelligent Systems for Socially Aware Computing; Social Computing, Behavioral-Cultural Modeling, and Prediction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Paraphrases are useful for statistical machine translation (SMT) and natural language processing tasks. Distributional paraphrase generation is independent of parallel texts and syntactic parses, and hence is suitable also for resource-poor languages, but tends to erroneously rank antonyms, trend-contrasting, and polarity-dissimilar candidates as good paraphrases. We present here a novel method for improving distributional paraphrasing by filtering out such candidates. We evaluate it in simulated low and mid-resourced SMT tasks, translating from English to two quite different languages. We show statistically significant gains in English-to-Chinese translation quality, up to 1 Bleu from non-filtered paraphrase-augmented models (1.6 Bleu from baseline). We also show that yielding gains in translation to Arabic, a morphologically rich language, is not straightforward.