Visually and Phonologically Similar Characters in Incorrect Chinese Words: Analyses, Identification, and Applications

Authors:
C.-L. Liu;M.-H. Lai;K.-W. Tien;Y.-H. Chuang;S.-H. Wu;C.-Y. Lee
Affiliations:
National Chengchi University;National Chengchi University;National Chengchi University;National Chengchi University;Chaoyang University of Technology;Academia Sinica
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2011

Citing 15
Cited 2

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Online Recognition of Chinese Characters: The State-of-the-Art

IEEE Transactions on Pattern Analysis and Machine Intelligence
Resolving the unencoded character problem for chinese digital libraries

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Automatic detecting/correcting errors in Chinese text by an approximate word-matching algorithm

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Introduction to Information Retrieval

Introduction to Information Retrieval
Search Engines: Information Retrieval in Practice

Search Engines: Information Retrieval in Practice
Using structural information for identifying similar Chinese characters

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Two Applications of Lexical Information to Computer-Assisted Item Authoring for Elementary Chinese

IEA/AIE '09 Proceedings of the 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: Next-Generation Applied Intelligence
Introduction to Algorithms, Third Edition

Introduction to Algorithms, Third Edition
Capturing errors in written Chinese words

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Phonological and logographic influences on errors in written Chinese words

ALR7 Proceedings of the 7th Workshop on Asian Language Resources
Visually and phonologically similar characters in incorrect simplified Chinese words

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A cognition-based interactive game platform for learning Chinese characters

Proceedings of the 2011 ACM Symposium on Applied Computing

A cognition-based game platform and its authoring environment for learning chinese characters

ITS'12 Proceedings of the 11th international conference on Intelligent Tutoring Systems
Applications of GPC rules and character structures in games for learning Chinese characters

ACL '12 Proceedings of the ACL 2012 System Demonstrations

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information about students’ mistakes opens a window to an understanding of their learning processes, and helps us design effective course work to help students avoid replication of the same errors. Learning from mistakes is important not just in human learning activities; it is also a crucial ingredient in techniques for the developments of student models. In this article, we report findings of our study on 4,100 erroneous Chinese words. Seventy-six percent of these errors were related to the phonological similarity between the correct and the incorrect characters, 46% were due to visual similarity, and 29% involved both factors. We propose a computing algorithm that aims at replication of incorrect Chinese words. The algorithm extends the principles of decomposing Chinese characters with the Cangjie codes to judge the visual similarity between Chinese characters. The algorithm also employs empirical rules to determine the degree of similarity between Chinese phonemes. To show its effectiveness, we ran the algorithm to select and rank a list of about 100 candidate characters, from more than 5,100 characters, for the incorrectly written character in each of the 4,100 errors. We inspected whether the incorrect character was indeed included in the candidate list and analyzed whether the incorrect character was ranked at the top of the candidate list. Experimental results show that our algorithm captured 97% of incorrect characters for the 4,100 errors, when the average length of the candidate lists was 104. Further analyses showed that the incorrect characters ranked among the top 10 candidates in 89% of the phonologically similar errors and in 80% of the visually similar errors.