The effect of corpus size on case frame acquisition for discourse analysis

Authors:
Ryohei Sasano;Daisuke Kawahara;Sadao Kurohashi
Affiliations:
Kyoto University;National Institute of Information and Communications Technology;Kyoto University
Venue:
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2009

Citing 10
Cited 4

Introduction to the special issue on the web as corpus

Computational Linguistics - Special issue on web as corpus
Mitigating the paucity-of-data problem: exploring the effect of training corpus size on classifier performance for natural language processing

HLT '01 Proceedings of the first international conference on Human language technology research
Scaling to very very large corpora for natural language disambiguation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Web-based models for natural language processing

ACM Transactions on Speech and Language Processing (TSLP)
Using the web in machine learning for other-anaphora resolution

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
A fully-lexicalized probabilistic model for Japanese syntactic and case structure analysis

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
The effect of corpus size in combining supervised and unsupervised training for disambiguation

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Using web-search results to measure word-group similarity

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A fully-lexicalized probabilistic model for Japanese zero anaphora resolution

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1

A probabilistic model for associative anaphora resolution

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
File-Access Characteristics of Data-Intensive Workflow Applications

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
File-access patterns of data-intensive workflow applications and their implications to distributed filesystems

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
A cross-lingual ILP solution to zero anaphora resolution

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reports the effect of corpus size on case frame acquisition for discourse analysis in Japanese. For this study, we collected a Japanese corpus consisting of up to 100 billion words, and constructed case frames from corpora of six different sizes. Then, we applied these case frames to syntactic and case structure analysis, and zero anaphora resolution. We obtained better results by using case frames constructed from larger corpora; the performance was not saturated even with a corpus size of 100 billion words.