COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Accurate unlexicalized parsing
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The Proposition Bank: An Annotated Corpus of Semantic Roles
Computational Linguistics
The Notion of Argument in Prepositional Phrase Attachment
Computational Linguistics
COLING-ACL '06 Proceedings of the COLING/ACL on Interactive presentation sessions
The second release of the RASP system
COLING-ACL '06 Proceedings of the COLING/ACL on Interactive presentation sessions
Feature forest models for probabilistic hpsg parsing
Computational Linguistics
A subcategorization acquisition system for French verbs
HLT-SRWS '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Student Research Workshop
BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Fully unsupervised core-adjunct argument classification
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Comparing and combining chunkers of biomedical text
Journal of Biomedical Informatics
Evaluating unsupervised learning for natural language processing tasks
EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Acquisition and evaluation of verb subcategorization resources for biomedicine
Journal of Biomedical Informatics
Hi-index | 0.00 |
Information about verb subcategorization frames (SCFs) is important to many tasks in natural language processing (NLP) and, in turn, text mining. Biomedicine has a need for high-quality SCF lexicons to support the extraction of information from the biomedical literature, which helps biologists to take advantage of the latest biomedical knowledge despite the overwhelming growth of that literature. Unfortunately, techniques for creating such resources for biomedical text are relatively undeveloped compared to general language. This paper serves as an introduction to subcategorization and existing approaches to acquisition, and provides motivation for developing techniques that address issues particularly important to biomedical NLP. First, we give the traditional linguistic definition of subcategorization, along with several related concepts. Second, we describe approaches to learning SCF lexicons from large data sets for general and biomedical domains. Third, we consider the crucial issue of linguistic variation between biomedical fields (subdomain variation). We demonstrate significant variation among subdomains, and find the variation does not simply follow patterns of general lexical variation. Finally, we note several requirements for future research in biomedical SCF lexicon acquisition: a high-quality gold standard, investigation of different definitions of subcategorization, and minimally-supervised methods that can learn subdomain-specific lexical usage without the need for extensive manual work.