Distributional phrasal paraphrase generation for statistical machine translation

Authors:
Yuval Marton
Affiliations:
University of Maryland, Columbia University, and IBM T.J. Watson Research Center, Bellevue, WA
Venue:
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Sections on Paraphrasing; Intelligent Systems for Socially Aware Computing; Social Computing, Behavioral-Cultural Modeling, and Prediction
Year:
2013

Citing 73
Cited 0

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
A cooccurrence-based thesaurus and two applications to information retrieval

Information Processing and Management: an International Journal
Similarity-Based Models of Word Cooccurrence Probabilities

Machine Learning - Special issue on natural language learning
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
Placing search in context: the concept revisited

ACM Transactions on Information Systems (TOIS)
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Multiword Expressions: A Pain in the Neck for NLP

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Discovery of inference rules for question-answering

Natural Language Engineering
Using syntactic dependency as local context to resolve word sense ambiguity

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Replacing suffix trees with enhanced suffix arrays

Journal of Discrete Algorithms - SPIRE 2002
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Correcting real-word spelling errors by restoring lexical cohesion

Natural Language Engineering
Extracting paraphrases from a parallel corpus

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora

Computational Linguistics
Scaling phrase-based statistical machine translation to larger corpora and longer phrases

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A hierarchical phrase-based model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Paraphrasing with bilingual parallel corpora

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Meaningful clustering of senses helps boost word sense disambiguation performance

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Characterising measures of lexical distributional similarity

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Improved statistical machine translation using paraphrases

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Hierarchical Phrase-Based Translation

Computational Linguistics
Corpus-based comprehensive and diagnostic MT evaluation: initial Arabic, Chinese, French, and Spanish results

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Machine translation by pattern matching

Machine translation by pattern matching
Four techniques for online handling of out-of-vocabulary words in Arabic-English statistical machine translation

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Word lattices for multi-source translation

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Distributional measures of concept-distance: a task-oriented evaluation

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Syntactic constraints on paraphrases extracted from parallel corpora

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Online large-margin training of syntactic and structural translation features

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Scalable language processing algorithms for the masses: a case study in computing word co-occurrence matrices with MapReduce

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A structured vector space model for word meaning in context

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
11,001 new features for statistical machine translation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Using paraphrases for parameter tuning in statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Improving Arabic-Chinese statistical machine translation using English as pivot language

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Extended gloss overlaps as a measure of semantic relatedness

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Directional distributional similarity for lexical expansion

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Introduction of a new paraphrase generation tool based on Monte-Carlo sampling

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Paraphrase recognition using machine learning to combine similarity measures

ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
Paraphrase identification as probabilistic quasi-synchronous recognition

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Source-language entailment modeling for translating unknown terms

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Application-driven statistical paraphrase generation

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Improved statistical machine translation using monolingually-derived paraphrases

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Estimating semantic distance using soft semantic constraints in knowledge-source-corpus hybrid models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
A non-negative tensor factorization model for selectional preference induction

GEMS '09 Proceedings of the Workshop on Geometrical Models of Natural Language Semantics
Automatic metaphor interpretation as a paraphrasing task

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Paraphrase lattice for statistical machine translation

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
From frequency to meaning: vector space models of semantics

Journal of Artificial Intelligence Research
Improving translation via targeted paraphrasing

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Facilitating translation using source language paraphrase lattices

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Example-based paraphrasing for improved phrase-based statistical machine translation

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Phrase clustering for smoothing TM probabilities: or, how to extract paraphrases from phrase tables

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A survey of paraphrasing and textual entailment methods

Journal of Artificial Intelligence Research
Measuring semantic distance using distributional profiles of concepts

Measuring semantic distance using distributional profiles of concepts
Modeling information scent: a comparison of LSA, PMI and GLSA similarity measures on common tests and corpora

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Generating phrasal and sentential paraphrases: A survey of data-driven methods

Computational Linguistics
Deciphering foreign language

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Extracting paraphrases from definition sentences on the web

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Translating from morphologically complex languages: a paraphrase-based approach

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Finding patterns with variable length gaps or don’t cares

COCOON'06 Proceedings of the 12th annual international conference on Computing and Combinatorics
Filtering antonymous, trend-contrasting, and polarity-dissimilar distributional paraphrases for improving statistical machine translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Aligning needles in a haystack: paraphrase acquisition across the web

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Modality and negation in simt use of modality and negation in semantically-informed syntactic mt

Computational Linguistics
Toward statistical machine translation without parallel corpora

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Paraphrase generation has been shown useful for various natural language processing tasks, including statistical machine translation. A commonly used method for paraphrase generation is pivoting [Callison-Burch et al. 2006], which benefits from linguistic knowledge implicit in the sentence alignment of parallel texts, but has limited applicability due to its reliance on parallel texts. Distributional paraphrasing [Marton et al. 2009a] has wider applicability, is more language-independent, but doesn't benefit from any linguistic knowledge. Nevertheless, we show that using distributional paraphrasing can yield greater gains in translation tasks. We report method improvements leading to higher gains than previously published, of almost 2 Bleu points, and provide implementation details, complexity analysis, and further insight into this method.