Scaling up biomedical event extraction to the entire PubMed
BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
A comparative study of syntactic parsers for event extraction
BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
Automatic construction and multi-level visualization of semantic trajectories
Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems
Exploiting rich features for detecting hedges and their scope
CoNLL '10: Shared Task Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task
Evaluating the impact of alternative dependency graph encodings on solving event extraction tasks
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Evaluating dependency representation for event extraction
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Event extraction as dependency parsing
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
Overview of the infectious diseases (ID) task of BioNLP Shared Task 2011
BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
Biomedical event extraction from abstracts and full papers using search-based structured prediction
BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
Event extraction as dependency parsing for BioNLP 2011
BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
Overview of the entity relations (REL) supporting task of BioNLP Shared Task 2011
BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
BioNLP Shared Task 2011: supporting resources
BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
MSR-NLP entry in BioNLP Shared Task 2011
BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
Generalizing biomedical event extraction
BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
Combining tree structures, flat features and patterns for biomedical relation extraction
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Hi-index | 0.00 |
Current efforts in syntactic parsing are largely data-driven. These methods require labeled examples of syntactic structures to learn statistical patterns governing these structures. Labeled data typically requires expert annotators which makes it both time consuming and costly to produce. Furthermore, once training data has been created for one textual domain, portability to similar domains is limited. This domain-dependence has inspired a large body of work since syntactic parsing aims to capture syntactic patterns across an entire language rather than just a specific domain. The simplest approach to this task is to assume that the target domain is essentially the same as the source domain. No additional knowledge about the target domain is included. A more realistic approach assumes that only raw text from the target domain is available. This assumption lends itself well to semi-supervised learning methods since these utilize both labeled and unlabeled examples. This dissertation focuses on a family of semi-supervised methods called self-training. Self-training creates semi-supervised learners from existing supervised learners with minimal effort. We first show results on self-training for constituency parsing within a single domain. While self-training has failed here in the past, we present a simple modification which allows it to succeed, producing state-of-the-art results for English constituency parsing. Next, we show how self-training is beneficial when parsing across domains and helps further when raw text is available from the target domain. One of the remaining issues is that one must choose a training corpus appropriate for the target domain or performance may be severely impaired. Humans can do this in some situations, but this strategy becomes less practical as we approach larger data sets. We present a technique, Any Domain Parsing, which automatically detects useful source domains and mixes them together to produce a customized parsing model. The resulting models perform almost as well as the best seen parsing models (oracle) for each target domain. As a result, we have a fully automatic syntactic constituency parser which can produce high-quality parses for all types of text, regardless of domain.