Recognizing noisy romanized Japanese words in learner English

  • Authors:
  • Ryo Nagata;Hiromi Sugimoto;Jun-ichi Kakegawa;Yukiko Yabuta

  • Affiliations:
  • Konan University, Kobe, Japan;The Japan Institute for Educational Measurement, Inc., Tokyo, Japan;Hyogo University of Teacher Education, Kato, Japan;The Japan Institute for Educational Measurement, Inc., Tokyo, Japan

  • Venue:
  • EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a method for recognizing romanized Japanese words in learner English. They become noise and problematic in a variety of tasks including Part-Of-Speech tagging, spell checking, and error detection because they are mostly unknown words. A problem one encounters when recognizing romanized Japanese words in learner English is that the spelling rules of romanized Japanese words are often violated by spelling errors. To address the problem, the described method uses a clustering algorithm reinforced by a small set of rules. Experiments show that it achieves an F-measure of 0.879 and outperforms other methods. They also show that it only requires the target text and a fair size of English word list.