Acquiring entailment pairs across languages and domains: a data analysis

Authors:
Manaal Faruqui;Sebastian Padó
Affiliations:
Indian Institute of Technology, Kharagpur, India;Universität Heidelberg, Heidelberg, Germany
Venue:
IWCS '11 Proceedings of the Ninth International Conference on Computational Semantics
Year:
2011

Citing 5
Cited 1

Learning to paraphrase: an unsupervised approach using multiple-sequence alignment

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Introduction to Information Retrieval

Introduction to Information Retrieval
Generating an entailment corpus from news headlines

EMSEE '05 Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment
"Ask not what textual entailment can do for you..."

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
The PASCAL recognising textual entailment challenge

MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment

Generalizing sub-sentential paraphrase acquisition across original signal type of text pairs

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Entailment pairs are sentence pairs of a premise and a hypothesis, where the premise textually entails the hypothesis. Such sentence pairs are important for the development of Textual Entailment systems. In this paper, we take a closer look at a prominent strategy for their automatic acquisition from newspaper corpora, pairing first sentences of articles with their titles. We propose a simple logistic regression model that incorporates and extends this heuristic and investigate its robustness across three languages and three domains. We manage to identify two predictors which predict entailment pairs with a fairly high accuracy across all languages. However, we find that robustness across domains within a language is more difficult to achieve.