Automatic extraction of subcorpora based on subcategorization frames from a part-of-speech tagged corpus

  • Authors:
  • Susanne Gahl

  • Affiliations:
  • ICSI, Berkeley, CA

  • Venue:
  • COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a method for extracting subcorpora documenting different subcategorization frames for verbs, nouns, and adjectives in the 100 mio. word British National Corpus. The extraction tool consists of a set of batch files for use with the Corpus Query Processor (CQP), which is part of the IMS corpus workbench (cf. Christ 1994a, b). A macroprocessor has been developed that allows the user to specify in a simple input file which subcorpora are to be created for a given lemma.The resulting subcorpora can be used (1) to provide evidence for the subcategorization properties of a given lemma, and to facilitate the selection of corpus lines for lexicographic research, and (2) to determine the frequencies of different syntactic contexts of each lemma.