Visually and phonologically similar characters in incorrect simplified Chinese words

Authors:
Chao-Lin Liu;Min-Hua Lai;Yi-Hsuan Chuang;Chia-Ying Lee
Affiliations:
National Chengchi University;National Chengchi University;National Chengchi University;National Chengchi University
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Year:
2010

Citing 8
Cited 3

Online Recognition of Chinese Characters: The State-of-the-Art

IEEE Transactions on Pattern Analysis and Machine Intelligence
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Search Engines: Information Retrieval in Practice

Search Engines: Information Retrieval in Practice
Using structural information for identifying similar Chinese characters

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Two Applications of Lexical Information to Computer-Assisted Item Authoring for Elementary Chinese

IEA/AIE '09 Proceedings of the 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: Next-Generation Applied Intelligence
Introduction to Algorithms, Third Edition

Introduction to Algorithms, Third Edition
Capturing errors in written Chinese words

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Phonological and logographic influences on errors in written Chinese words

ALR7 Proceedings of the 7th Workshop on Asian Language Resources

Visually and Phonologically Similar Characters in Incorrect Chinese Words: Analyses, Identification, and Applications

ACM Transactions on Asian Language Information Processing (TALIP)
A cognition-based interactive game platform for learning Chinese characters

Proceedings of the 2011 ACM Symposium on Applied Computing
Why press backspace?: understanding user input behaviors in Chinese Pinyin input method

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Visually and phonologically similar characters are major contributing factors for errors in Chinese text. By defining appropriate similarity measures that consider extended Cangjie codes, we can identify visually similar characters within a fraction of a second. Relying on the pronunciation information noted for individual characters in Chinese lexicons, we can compute a list of characters that are phonologically similar to a given character. We collected 621 incorrect Chinese words reported on the Internet, and analyzed the causes of these errors. 83% of these errors were related to phonological similarity, and 48% of them were related to visual similarity between the involved characters. Generating the lists of phonologically and visually similar characters, our programs were able to contain more than 90% of the incorrect characters in the reported errors.