Can document selection help semi-supervised learning?: a case study on event extraction

Authors:
Shasha Liao;Ralph Grishman
Affiliations:
New York University;New York University
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Year:
2011

Citing 11
Cited 0

Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Automatic acquisition of domain knowledge for Information Extraction

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Counter-training in discovery of semantic patterns

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A semantic approach to IE pattern induction

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Redundancy-based correction of automatically extracted facts

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Predicting unknown time arguments based on cross-event propagation

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Bootstrapping events and relations from text

Bootstrapping events and relations from text
Using document level cross-event inference to improve event extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Automatically generating extraction patterns from untagged text

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Filtered ranking for bootstrapping in event extraction

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not enough. In this paper, we present a novel self-training strategy, which uses Information Retrieval (IR) to collect a cluster of related documents as the resource for bootstrapping. Also, based on the particular characteristics of this corpus, global inference is applied to provide more confident and informative data selection. We compare this approach to self-training on a normal newswire corpus and show that IR can provide a better corpus for bootstrapping and that global inference can further improve instance selection. We obtain gains of 1.7% in trigger labeling and 2.3% in role labeling through IR and an additional 1.1% in trigger labeling and 1.3% in role labeling by applying global inference.