Meaning and grammar (2nd ed.): an introduction to semantics
Meaning and grammar (2nd ed.): an introduction to semantics
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Anaphoric dependencies in ellipsis
Computational Linguistics
An empirical approach to VP ellipsis
Computational Linguistics
The effect of establishing coherence in ellipsis and anaphora resolution
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Representing Discourse Coherence: A Corpus-Based Study
Computational Linguistics
Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory
SIGDIAL '01 Proceedings of the Second SIGdial Workshop on Discourse and Dialogue - Volume 16
OntoNotes: A Unified Relational Semantic Representation
ICSC '07 Proceedings of the International Conference on Semantic Computing
Linguistically motivated large-scale NLP with C&C and boxer
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Genre distinctions for discourse in the Penn TreeBank
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
The PASCAL recognising textual entailment challenge
MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment
The choice between verbal anaphors in discourse
DAARC'11 Proceedings of the 8th international conference on Anaphora Processing and Applications
Hi-index | 0.00 |
Verb Phrase Ellipsis (VPE) has been studied in great depth in theoretical linguistics, but empirical studies of VPE are rare. We extend the few previous corpus studies with an annotated corpus of VPE in all 25 sections of the Wall Street Journal corpus (WSJ) distributed with the Penn Treebank. We annotated the raw files using a stand-off annotation scheme that codes the auxiliary verb triggering the elided verb phrase, the start and end of the antecedent, the syntactic type of antecedent (VP, TV, NP, PP or AP), and the type of syntactic pattern between the source and target clauses of the VPE and its antecedent. We found 487 instances of VPE (including predicative ellipsis, antecedent-contained deletion, comparative constructions, and pseudo-gapping) plus 67 cases of related phenomena such as do so anaphora. Inter-annotator agreement was high, with a 0.97 average F-score for three annotators for one section of the WSJ. Our annotation is theory neutral, and has better coverage than earlier efforts that relied on automatic methods, e.g. simply searching the parsed version of the Penn Treebank for empty VP's achieves a high precision (0.95) but low recall (0.58) when compared with our manual annotation. The distribution of VPE source---target patterns deviates highly from the standard examples found in the theoretical linguistics literature on VPE, once more underlining the value of corpus studies. The resulting corpus will be useful for studying VPE phenomena as well as for evaluating natural language processing systems equipped with ellipsis resolution algorithms, and we propose evaluation measures for VPE detection and VPE antecedent selection. The stand-off annotation is freely available for research purposes.