Capturing errors in written Chinese words

Authors:
Chao-Lin Liu;Kan-Wen Tien;Min-Hua Lai;Yi-Hsuan Chuang;Shih-Hung Wu
Affiliations:
National Chengchi University;National Chengchi University;National Chengchi University;National Chengchi University;Chaoyang University of Technology, Taiwan
Venue:
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Year:
2009

Citing 3
Cited 5

Resolving the unencoded character problem for chinese digital libraries

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Using structural information for identifying similar Chinese characters

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Phonological and logographic influences on errors in written Chinese words

ALR7 Proceedings of the 7th Workshop on Asian Language Resources

Visually and phonologically similar characters in incorrect simplified Chinese words

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Visually and Phonologically Similar Characters in Incorrect Chinese Words: Analyses, Identification, and Applications

ACM Transactions on Asian Language Information Processing (TALIP)
A cognition-based interactive game platform for learning Chinese characters

Proceedings of the 2011 ACM Symposium on Applied Computing
Why press backspace?: understanding user input behaviors in Chinese Pinyin input method

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Error Diagnosis of Chinese Sentences Using Inductive Learning Algorithm and Decomposition-Based Testing Mechanism

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.01

Visualization

Abstract

A collection of 3208 reported errors of Chinese words were analyzed. Among which, 7.2% involved rarely used character, and 98.4% were assigned common classifications of their causes by human subjects. In particular, 80% of the errors observed in writings of middle school students were related to the pronunciations and 30% were related to the compositions of words. Experimental results show that using intuitive Web-based statistics helped us capture only about 75% of these errors. In a related task, the Web-based statistics are useful for recommending incorrect characters for composing test items for "incorrect character identification" tests about 93% of the time.