IRASubcat, a highly customizable, language independent tool for the acquisition of verbal subcategorization information from corpus

  • Authors:
  • Ivana Romina Altamirano;Laura Alonso i Alemany

  • Affiliations:
  • Universidad Nacional de Córdoba, Córdoba, Argentina;Universidad Nacional de Córdoba, Córdoba, Argentina

  • Venue:
  • YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

IRASubcat is a language-independent tool to acquire information about the subcategorization of verbs from corpus. The tool can extract information from corpora annotated at various levels, including almost raw text, where only verbs are identified. It can also aggregate information from a pre-existing lexicon with verbal subcategorization information. The system is highly customizable, and works with XML as input and output format. IRASubcat identifies patterns of constituents in the corpus, and associates patterns with verbs if their association strength is over a frequency threshold and passes the likelihood ratio hypothesis test. It also implements a procedure to identify verbal constituents that could be playing the role of an adjunct in a pattern. Thresholds controlling frequency and identification of adjuncts can be customized by the user, or else they are given a default value.