Journal of the ACM (JACM)
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora
Computational Linguistics
An algorithm for simultaneously bracketing parallel texts by aligning words
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
A syntax-based statistical translation model
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Reordering constraints for phrase-based statistical machine translation
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Syntax-based alignment: supervised or unsupervised?
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
The PASCAL recognising textual entailment challenge
MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment
Paraphrase recognition via dissimilarity significance classification
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Paraphrase identification as probabilistic quasi-synchronous recognition
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Generating phrasal and sentential paraphrases: A survey of data-driven methods
Computational Linguistics
Hi-index | 0.00 |
We present first results using paraphrase as well as textual entailment data to test the language universal constraint posited by Wu's (1995, 1997) Inversion Transduction Grammar (ITG) hypothesis. In machine translation and alignment, the ITG Hypothesis provides a strong inductive bias, and has been shown empirically across numerous language pairs and corpora to yield both efficiency and accuracy gains for various language acquisition tasks. Monolingual paraphrase and textual entailment recognition datasets, however, potentially facilitate closer tests of certain aspects of the hypothesis than bilingual parallel corpora, which simultaneously exhibit many irrelevant dimensions of cross-lingual variation. We investigate this using simple generic Bracketing ITGs containing no language-specific linguistic knowledge. Experimental results on the MSR Paraphrase Corpus show that, even in the absence of any thesaurus to accommodate lexical variation between the paraphrases, an uninterpolated average precision of at least 76% is obtainable from the Bracketing ITG's structure matching bias alone. This is consistent with experimental results on the Pascal Recognising Textual Entailment Challenge Corpus, which show surpisingly strong results for a number of the task subsets.