Recognizing noisy romanized Japanese words in learner English

Authors:
Ryo Nagata;Hiromi Sugimoto;Jun-ichi Kakegawa;Yukiko Yabuta
Affiliations:
Konan University, Kobe, Japan;The Japan Institute for Educational Measurement, Inc., Tokyo, Japan;Hyogo University of Teacher Education, Kato, Japan;The Japan Institute for Educational Measurement, Inc., Tokyo, Japan
Venue:
EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
Year:
2008

Citing 9
Cited 0

Machine transliteration

Computational Linguistics
An unsupervised method for detecting grammatical errors

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Automatic error detection in the Japanese learners' English spoken data

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Detecting errors in English article usage by non-native speakers

Natural Language Engineering
A feedback-augmented method for detecting errors in the writing of learners of English

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Extracting loanwords from Mongolian corpora and producing a Japanese-Mongolian bilingual dictionary

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Semisupervised Learning for Computational Linguistics

Semisupervised Learning for Computational Linguistics
Capturing out-of-vocabulary words in Arabic text

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Detecting article errors based on the mass count distinction

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a method for recognizing romanized Japanese words in learner English. They become noise and problematic in a variety of tasks including Part-Of-Speech tagging, spell checking, and error detection because they are mostly unknown words. A problem one encounters when recognizing romanized Japanese words in learner English is that the spelling rules of romanized Japanese words are often violated by spelling errors. To address the problem, the described method uses a clustering algorithm reinforced by a small set of rules. Experiments show that it achieves an F-measure of 0.879 and outperforms other methods. They also show that it only requires the target text and a fair size of English word list.