Clustering and matching headlines for automatic paraphrase acquisition

Authors:
Sander Wubben;Antal van den Bosch;Emiel Krahmer;Erwin Marsi
Affiliations:
Tilburg University, The Netherlands;Tilburg University, The Netherlands;Tilburg University, The Netherlands;Tilburg University, The Netherlands
Venue:
ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
Year:
2009

Citing 8
Cited 7

DIRT @SBT@discovery of inference rules from text

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Summarization beyond sentence extraction: a probabilistic approach to sentence compression

Artificial Intelligence
Sentence alignment for monolingual comparable corpora

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Improved statistical machine translation using paraphrases

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Paraphrasing for automatic evaluation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Automatic cluster stopping with criterion functions and the gap statistic

NAACL-Demonstrations '06 Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: demonstrations
Re-evaluating machine translation results with paraphrase support

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

Paraphrase generation as monolingual translation: data and evaluation

INLG '10 Proceedings of the 6th International Natural Language Generation Conference
A survey of paraphrasing and textual entailment methods

Journal of Artificial Intelligence Research
Comparing phrase-based and syntax-based paraphrase generation

MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Aligning predicate argument structures in monolingual comparable texts: a new corpus for a new task

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Aligning predicates across monolingual comparable texts using graph-based clustering

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Enlarging paraphrase collections through generalization and instantiation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Generalizing sub-sentential paraphrase acquisition across original signal type of text pairs

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

For developing a data-driven text rewriting algorithm for paraphrasing, it is essential to have a monolingual corpus of aligned paraphrased sentences. News article headlines are a rich source of paraphrases; they tend to describe the same event in various different ways, and can easily be obtained from the web. We compare two methods of aligning headlines to construct such an aligned corpus of paraphrases, one based on clustering, and the other on pairwise similarity-based matching. We show that the latter performs best on the task of aligning paraphrastic headlines.