Creating a manually error-tagged and shallow-parsed learner corpus

  • Authors:
  • Ryo Nagata;Edward Whittaker;Vera Sheinman

  • Affiliations:
  • Konan University, Okamoto, Kobe, Japan;The Japan Institute for Educational Measurement Inc., Kita-Aoyama, Tokyo, Japan;The Japan Institute for Educational Measurement Inc., Kita-Aoyama, Tokyo, Japan

  • Venue:
  • HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The availability of learner corpora, especially those which have been manually error-tagged or shallow-parsed, is still limited. This means that researchers do not have a common development and test set for natural language processing of learner English such as for grammatical error detection. Given this background, we created a novel learner corpus that was manually error-tagged and shallow-parsed. This corpus is available for research and educational purposes on the web. In this paper, we describe it in detail together with its data-collection method and annotation schemes. Another contribution of this paper is that we take the first step toward evaluating the performance of existing POS-tagging/chunking techniques on learner corpora using the created corpus. These contributions will facilitate further research in related areas such as grammatical error detection and automated essay scoring.