Robust extraction of subcategorization data from spoken language

  • Authors:
  • Jianguo Li;Chris Brew;Eric Fosler-Lussier

  • Affiliations:
  • The Ohio State University;The Ohio State University;The Ohio State University

  • Venue:
  • Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Subcategorization data has been crucial for various NLP tasks. Current method for automatic SCF acquisition usually proceeds in two steps: first, generate all SCF cues from a corpus using a parser, and then filter out spurious SCF cues with statistical tests. Previous studies on SCF acquisition have worked mainly with written texts; spoken corpora have received little attention. Transcripts of spoken language pose two challenges absent in written texts: uncertainty about utterance segmentation and disfluency.