Acquiring paraphrases from text corpora

Authors:
Rahul Bhagat;Eduard Hovy;Siddharth Patwardhan
Affiliations:
USC Information Sciences Institute, Marina del Rey, CA, USA;USC Information Sciences Institute, Marina del Rey, CA, USA;University of Utah, Salt Lake City, UT, USA
Venue:
Proceedings of the fifth international conference on Knowledge capture
Year:
2009

Citing 16
Cited 1

Elements of information theory

Elements of information theory
Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
DIRT @SBT@discovery of inference rules from text

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Similarity estimation techniques from rounding algorithms

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Toward general-purpose learning for information extraction

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Information fusion in the context of multi-document summarization

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Overview of the fourth message understanding evaluation and conference

MUC4 '92 Proceedings of the 4th conference on Message understanding
Learning surface text patterns for a Question Answering system

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
FASTUS: a system for extracting information from text

HLT '93 Proceedings of the workshop on Human Language Technology
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Improved statistical machine translation using paraphrases

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Acquisition of verb entailment from text

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
On-demand information extraction

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Automatic paraphrase acquisition from news articles

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Learning entailment rules for unary templates

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Automatically generating extraction patterns from untagged text

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2

Highlighting disputed claims on the web

Proceedings of the 19th international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Paraphrases are textual expressions that convey the same meaning using different surface forms. Capturing the variability of language, they play an important role in many natural language applications includ ing question answering, machine translation, and multi-document summarization. In linguistics, paraphrases are characterized by approximate conceptual equivalence. Since no automated semantic interpretation systems available today can identify conceptual equivalence, paraphrases are difficult to acquire without human effort. In this paper, we present a method for automatically acquiring paraphrases using a monolingual corpus. We learn paraphrases at both the surface and lexico-syntactic levels and build two paraphrase resources each containing about 2 million phrases. We evaluate these paraphrases extrinsically by using them to learn patterns for Information Extraction (IE). We show that the lexico-syntactic paraphrases performs better than the surface-level paraphrases for IE. We further show that the patterns learned using the lexico-syntactic paraphrases attain comparable performance to the traditional IE approach of learning patterns from domain-specific corpora.