Automatic acquisition of subcategorization frames from untagged text

  • Authors:
  • Michael R. Brent

  • Affiliations:
  • MIT AI Lab, Cambridge, Massachusetts

  • Venue:
  • ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
  • Year:
  • 1991

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper describes an implemented program that takes a raw, untagged text corpus as its only input (no open-class dictionary) and generates a partial list of verbs occurring in the text and the subcategorization frames (SFs) in which they occur. Verbs are detected by a novel technique based on the Case Filter of Rouvret and Vergnaud (1980). The completeness of the output list increases monotonically with the total number of occurrences of each verb in the corpus. False positive rates are one to three percent of observations. Five SFs are currently detected and more are planned. Ultimately, I expect to provide a large SF dictionary to the NLP community and to train dictionaries for specific corpora.