Power-law distributions for paraphrases extracted from bilingual corpora

Authors:
Spyros Martzoukos;Christof Monz
Affiliations:
University of Amsterdam Science Park, XH Amsterdam, The Netherlands;University of Amsterdam Science Park, XH Amsterdam, The Netherlands
Venue:
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Year:
2012

Citing 17
Cited 0

Information Theoretic Clustering of Sparse Co-Occurrence Data

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Paraphrasing with bilingual parallel corpora

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Improved statistical machine translation using paraphrases

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Syntactic constraints on paraphrases extracted from parallel corpora

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
(Meta-) evaluation of machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Affinity measures based on the graph Laplacian

TextGraphs-3 Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing
Extracting paraphrase patterns from bilingual parallel corpora

Natural Language Engineering
Statistical Machine Translation

Statistical Machine Translation
Hitting the right paraphrases in good time

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Paraphrase lattice for statistical machine translation

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Phrase clustering for smoothing TM probabilities: or, how to extract paraphrases from phrase tables

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A survey of paraphrasing and textual entailment methods

Journal of Artificial Intelligence Research
Generating phrasal and sentential paraphrases: A survey of data-driven methods

Computational Linguistics
An empirical evaluation of data-driven paraphrase generation techniques

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Learning sentential paraphrases from bilingual parallel corpora for text-to-text generation

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a novel method that extracts paraphrases from a bitext, for both the source and target languages. In order to reduce the search space, we decompose the phrase-table into sub-phrase-tables and construct separate clusters for source and target phrases. We convert the clusters into graphs, add smoothing/syntactic-information-carrier vertices, and compute the similarity between phrases with a random walk-based measure, the commute time. The resulting phrase-paraphrase probabilities are built upon the conversion of the commute times into artificial co-occurrence counts with a novel technique. The co-occurrence count distribution belongs to the power-law family.