Improving the accuracy of subcategorizations acquired from corpora

Authors:
Naoki Yoshinaga
Affiliations:
University of Tokyo, Bunkyo-ku, Tokyo
Venue:
ACLstudent '04 Proceedings of the ACL 2004 workshop on Student research
Year:
2004

Citing 8
Cited 0

Large lexicons for natural language processing: utilising the grammar coding system of LDOCE

Computational Linguistics - Special issue of the lexicon
Learning structure and concepts in data through data clustering

Learning structure and concepts in data through data clustering
Automatic extraction of subcategorization from corpora

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Comlex Syntax: building a computational lexicon

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Inducing German semantic verb classes from purely syntactic subcategorisation information

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Clustering polysemic subcategorization frame distributions semantically

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The automatic acquisition of verb subcategorisations and their impact on the performance of an HPSG parser

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Some experiments on indicators of parsing complexity for lexicalized grammars

Proceedings of the COLING-2000 Workshop on Efficiency In Large-Scale Parsing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a method of improving the accuracy of subcategorization frames (SCFs) acquired from corpora to augment existing lexicon resources. I estimate a confidence value of each SCF using corpus-based statistics, and then perform clustering of SCF confidence-value vectors for words to capture cooccurrence tendency among SCFs in the lexicon. I apply my method to SCFs acquired from corpora using lexicons of two large-scale lexicalized grammars. The resulting SCFs achieve higher precision and recall compared to SCFs obtained by naive frequency cut-off.