Robust extraction of subcategorization data from spoken language

Authors:
Jianguo Li;Chris Brew;Eric Fosler-Lussier
Affiliations:
The Ohio State University;The Ohio State University;The Ohio State University
Venue:
Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Year:
2005

Citing 7
Cited 0

From grammar to lexicon: unsupervised learning of lexical syntax

Computational Linguistics - Special issue on using large corpora: II
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Automatic extraction of subcategorization from corpora

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
How verb subcategorization frequencies are affected by corpus choice

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Comlex Syntax: building a computational lexicon

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Automatic extraction of subcategorization frames for Czech

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Intricacies of Collins' Parsing Model

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Subcategorization data has been crucial for various NLP tasks. Current method for automatic SCF acquisition usually proceeds in two steps: first, generate all SCF cues from a corpus using a parser, and then filter out spurious SCF cues with statistical tests. Previous studies on SCF acquisition have worked mainly with written texts; spoken corpora have received little attention. Transcripts of spoken language pose two challenges absent in written texts: uncertainty about utterance segmentation and disfluency.