Extracting paraphrase patterns from bilingual parallel corpora

Authors:
Shiqi Zhao;Haifeng Wang;Ting Liu;Sheng Li
Affiliations:
Harbin institute of technology, no. 27 jiaohua street, nangang district, harbin 150001, china e-mails: zhaosq@ir.hit.edu.cn, tliu@ir.hit.edu.cn, lisheng@ir.hit.edu.cn;Toshiba (china) research and development center, no. 1, east chang an ave., dongcheng district, beijing 100738, chinawanghaifeng@rdc.toshiba.com.cn;Harbin institute of technology, no. 27 jiaohua street, nangang district, harbin 150001, china e-mails: zhaosq@ir.hit.edu.cn, tliu@ir.hit.edu.cn, lisheng@ir.hit.edu.cn;Harbin institute of technology, no. 27 jiaohua street, nangang district, harbin 150001, china e-mails: zhaosq@ir.hit.edu.cn, tliu@ir.hit.edu.cn, lisheng@ir.hit.edu.cn
Venue:
Natural Language Engineering
Year:
2009

Citing 24
Cited 7

Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Information fusion for multidocument summarization: paraphrasing and generation

Information fusion for multidocument summarization: paraphrasing and generation
Discovery of inference rules for question-answering

Natural Language Engineering
Extracting paraphrases from a parallel corpus

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Learning surface text patterns for a Question Answering system

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning to paraphrase: an unsupervised approach using multiple-sequence alignment

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Syntax-based alignment of multiple translations: extracting paraphrases and generating new sentences

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Extracting structural paraphrases from aligned monolingual corpora

PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16
Paraphrasing with bilingual parallel corpora

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Improved statistical machine translation using paraphrases

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Paraphrasing for automatic evaluation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Local similarity between quotiented ordered trees

Journal of Discrete Algorithms
Tracking and summarizing news on a daily basis with Columbia's Newsblaster

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Automatic paraphrase acquisition from news articles

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Dependency parsing based on dynamic local optimization

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Learning question paraphrases for QA from Encarta logs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
On the role of lexical and world knowledge in RTE3

RTE '07 Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing
Dependency-based paraphrasing for recognizing textual entailment

RTE '07 Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing
Semantic and logical inference model for textual entailment

RTE '07 Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing
Hypothesis transformation and semantic variability rules used in recognizing textual entailment

RTE '07 Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing
Semantic inference at the lexical-syntactic level for textual entailment recognition

RTE '07 Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing
Learning alignments and leveraging natural logic

RTE '07 Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

Using bilingual parallel corpora for cross-lingual textual entailment

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
An empirical evaluation of data-driven paraphrase generation techniques

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
A generate and rank approach to sentence paraphrasing

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A new sentence compression dataset and its use in an abstractive generate-and-rank sentence compressor

UCNLG+EVAL '11 Proceedings of the UCNLG+Eval: Language Generation and Evaluation Workshop
Power-law distributions for paraphrases extracted from bilingual corpora

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Enlarging paraphrase collections through generalization and instantiation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Extracting context-rich entailment rules from Wikipedia revision history

Proceedings of the 3rd Workshop on the People's Web Meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP

Quantified Score

Hi-index	0.00

Visualization

Abstract

Paraphrase patterns are semantically equivalent patterns, which are useful in both paraphrase recognition and generation. This paper presents a pivot approach for extracting paraphrase patterns from bilingual parallel corpora, whereby the paraphrase patterns in English are extracted using the patterns in another language as pivots. We make use of log-linear models for computing the paraphrase likelihood between pattern pairs and exploit feature functions based on maximum likelihood estimation (MLE), lexical weighting (LW), and monolingual word alignment (MWA). Using the presented method, we extract more than 1 million pairs of paraphrase patterns from about 2 million pairs of bilingual parallel sentences. The precision of the extracted paraphrase patterns is above 78%. Experimental results show that the presented method significantly outperforms a well-known method called discovery of inference rules from text (DIRT). Additionally, the log-linear model with the proposed feature functions are effective. The extracted paraphrase patterns are fully analyzed. Especially, we found that the extracted paraphrase patterns can be classified into five types, which are useful in multiple natural language processing (NLP) applications.