Automatic acquisition of subcategorization frames from untagged text

Authors:
Michael R. Brent
Affiliations:
MIT AI Lab, Cambridge, Massachusetts
Venue:
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Year:
1991

Citing 6
Cited 37

Word association norms, mutual information, and lexicography

Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Automatic semantic classification of verbs from their syntactic contexts: an implemented classifier for stativity

EACL '91 Proceedings of the fifth conference on European chapter of the Association for Computational Linguistics
Parsing the LOB corpus

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Automatically extracting and representing collocations for language generation

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Noun classification from predicate-argument structures

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics

Automatic construction of semantic lexicons for learning natural language interfaces

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Knowledge Acquisition of Predicate Argument Structures from Technical Texts Using Machine Learning: The System ASIUM

EKAW '99 Proceedings of the 11th European Workshop on Knowledge Acquisition, Modeling and Management
Generalized probabilistic LR parsing of natural language (Corpora) with unification-based grammars

Computational Linguistics - Special issue on using large corpora: I
From grammar to lexicon: unsupervised learning of lexical syntax

Computational Linguistics - Special issue on using large corpora: II
Combination of symbolic and statistical approaches for grammatical knowledge acquisition

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Computational lexicons: the neat examples and the odd exemplars

ANLC '92 Proceedings of the third conference on Applied natural language processing
Automatic extraction of subcategorization from corpora

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Processing unknown words in HPSG

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Towards history-based grammars: using richer models for probabilistic parsing

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Automatic acquisition of a large subcategorization dictionary from corpora

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
An empirical study on thematic knowledge acquisition based on syntactic clues and heuristics

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Corpus-based acquisition of relative pronoun disambiguation heuristics

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Incremental identification of inflectional types

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Hypothesis selection in grammar acquisition

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Lexical knowledge acquisition from bilingual corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Automatic extraction of subcategorization frames for Czech

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Can subcategorization help a statistical dependency parser?

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Towards history-based grammars: using richer models for probabilistic parsing

HLT '91 Proceedings of the workshop on Speech and Natural Language
Hypothesizing word association from untagged text

HLT '93 Proceedings of the workshop on Human Language Technology
Statistical filtering and subcategorization frame acquisition

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Clustering Syntactic Positions with Similar Semantic Requirements

Computational Linguistics
Automatic acquisition of adjectival subcategorization from corpora

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Subcategorization acquisition and evaluation for Chinese verbs

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Re-estimation of lexical parameters for treebank PCFGs

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Acquiring word-meaning mappings for natural language interfaces

Journal of Artificial Intelligence Research
Wrap-Up: a trainable discourse module for information extraction

Journal of Artificial Intelligence Research
Bengali verb subcategorization frame acquisition: a baseline model

ALR7 Proceedings of the 7th Workshop on Asian Language Resources
An information-theoretic based model for large-scale contextual text processing

Information Sciences: an International Journal
Learning to disambiguate relative pronouns

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
A case-based approach to knowledge acquisition for domain-specific sentence analysis

AAAI'93 Proceedings of the eleventh national conference on Artificial intelligence
Analysis of definitions of verbs in an explanatory dictionary for automatic extraction of actants based on detection of patterns

NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
Extracting hyponymy patterns in Tibetan language to enrich minority languages knowledge base

WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Extracting hyponymy patterns in Tibetan language to enrich minority languages knowledge bas

AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part III
FML-Based SCF predefinition learning for chinese verbs

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Unsupervised learning of verb argument structures

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Incorporating linguistic knowledge in statistical machine translation: translating prepositions

HYBRID '12 Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper describes an implemented program that takes a raw, untagged text corpus as its only input (no open-class dictionary) and generates a partial list of verbs occurring in the text and the subcategorization frames (SFs) in which they occur. Verbs are detected by a novel technique based on the Case Filter of Rouvret and Vergnaud (1980). The completeness of the output list increases monotonically with the total number of occurrences of each verb in the corpus. False positive rates are one to three percent of observations. Five SFs are currently detected and more are planned. Ultimately, I expect to provide a large SF dictionary to the NLP community and to train dictionaries for specific corpora.