Phonological and logographic influences on errors in written Chinese words

Authors:
Chao-Lin Liu;Kan-Wen Tien;Min-Hua Lai;Yi-Hsuan Chuang;Shih-Hung Wu
Affiliations:
National Chengchi University;National Chengchi University;National Chengchi University;National Chengchi University;Chaoyang University of Technology, Taiwan
Venue:
ALR7 Proceedings of the 7th Workshop on Asian Language Resources
Year:
2009

Citing 3
Cited 5

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Resolving the unencoded character problem for chinese digital libraries

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Using structural information for identifying similar Chinese characters

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers

Capturing errors in written Chinese words

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Visually and phonologically similar characters in incorrect simplified Chinese words

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Visually and Phonologically Similar Characters in Incorrect Chinese Words: Analyses, Identification, and Applications

ACM Transactions on Asian Language Information Processing (TALIP)
A cognition-based interactive game platform for learning Chinese characters

Proceedings of the 2011 ACM Symposium on Applied Computing
Error Diagnosis of Chinese Sentences Using Inductive Learning Algorithm and Decomposition-Based Testing Mechanism

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We analyze a collection of 3208 reported errors of Chinese words. Among these errors, 7.2% involved rarely used character, and 98.4% were assigned common classifications of their causes by human subjects. In particular, 80% of the errors observed in the writings of middle school students were related to the pronunciations and 30% were related to the logographs of the words. We conducted experiments that shed light on using the Web-based statistics to correct the errors, and we designed a software environment for preparing test items whose authors intentionally replace correct characters with wrong ones. Experimental results show that using Web-based statistics can help us correct only about 75% of these errors. In contrast, Web-based statistics are useful for recommending incorrect characters for composing test items for "incorrect character identification" tests about 93% of the time.