The effect of corpus size on case frame acquisition for discourse analysis

  • Authors:
  • Ryohei Sasano;Daisuke Kawahara;Sadao Kurohashi

  • Affiliations:
  • Kyoto University;National Institute of Information and Communications Technology;Kyoto University

  • Venue:
  • NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper reports the effect of corpus size on case frame acquisition for discourse analysis in Japanese. For this study, we collected a Japanese corpus consisting of up to 100 billion words, and constructed case frames from corpora of six different sizes. Then, we applied these case frames to syntactic and case structure analysis, and zero anaphora resolution. We obtained better results by using case frames constructed from larger corpora; the performance was not saturated even with a corpus size of 100 billion words.