Improving the accuracy of subcategorizations acquired from corpora

  • Authors:
  • Naoki Yoshinaga

  • Affiliations:
  • University of Tokyo, Bunkyo-ku, Tokyo

  • Venue:
  • ACLstudent '04 Proceedings of the ACL 2004 workshop on Student research
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a method of improving the accuracy of subcategorization frames (SCFs) acquired from corpora to augment existing lexicon resources. I estimate a confidence value of each SCF using corpus-based statistics, and then perform clustering of SCF confidence-value vectors for words to capture cooccurrence tendency among SCFs in the lexicon. I apply my method to SCFs acquired from corpora using lexicons of two large-scale lexicalized grammars. The resulting SCFs achieve higher precision and recall compared to SCFs obtained by naive frequency cut-off.