Foundations of statistical natural language processing
Foundations of statistical natural language processing
Resolving the unencoded character problem for chinese digital libraries
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Using structural information for identifying similar Chinese characters
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Capturing errors in written Chinese words
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Visually and phonologically similar characters in incorrect simplified Chinese words
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
ACM Transactions on Asian Language Information Processing (TALIP)
A cognition-based interactive game platform for learning Chinese characters
Proceedings of the 2011 ACM Symposium on Applied Computing
ACM Transactions on Asian Language Information Processing (TALIP)
Hi-index | 0.00 |
We analyze a collection of 3208 reported errors of Chinese words. Among these errors, 7.2% involved rarely used character, and 98.4% were assigned common classifications of their causes by human subjects. In particular, 80% of the errors observed in the writings of middle school students were related to the pronunciations and 30% were related to the logographs of the words. We conducted experiments that shed light on using the Web-based statistics to correct the errors, and we designed a software environment for preparing test items whose authors intentionally replace correct characters with wrong ones. Experimental results show that using Web-based statistics can help us correct only about 75% of these errors. In contrast, Web-based statistics are useful for recommending incorrect characters for composing test items for "incorrect character identification" tests about 93% of the time.