Resolving the unencoded character problem for chinese digital libraries

Authors:
Derming Juang;Jenq-Haur Wang;Chen-Yu Lai;Ching-Chun Hsieh;Lee-Feng Chien;Jan-Ming Ho
Affiliations:
Academia Sinica, Taiwan;Academia Sinica, Taiwan;Academia Sinica, Taiwan;Academia Sinica, Taiwan;Academia Sinica, Taiwan;Academia Sinica, Taiwan
Venue:
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Year:
2005

Citing 2
Cited 7

The Unicode standard, version 2.0

The Unicode standard, version 2.0
Decomposition for ISO/IEC 10646 ideographic characters

COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12

A Mechanism for Solving the Unencoded Chinese Character Problem on the Web

ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Using structural information for identifying similar Chinese characters

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Two Applications of Lexical Information to Computer-Assisted Item Authoring for Elementary Chinese

IEA/AIE '09 Proceedings of the 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: Next-Generation Applied Intelligence
Capturing errors in written Chinese words

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Phonological and logographic influences on errors in written Chinese words

ALR7 Proceedings of the 7th Workshop on Asian Language Resources
Visually and Phonologically Similar Characters in Incorrect Chinese Words: Analyses, Identification, and Applications

ACM Transactions on Asian Language Information Processing (TALIP)
A cognition-based interactive game platform for learning Chinese characters

Proceedings of the 2011 ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Constructing a Chinese digital library, especially for a historical article archiving, is often bothered by the small character sets supported by the current computer systems. This paper is aimed at resolving the unencoded character problem with a practical and composite approach for Chinese digital libraries. The proposed approach consists of the glyph expression model, the glyph structure database, and supporting tools. With this approach, the following problems can be resolved. First, the extensibility of Chinese characters can be preserved. Second, it would be as easy to generate, input, display, and search unencoded characters as existing ones. Third, it is compatible with existing encoding schemes that most computers use.This approach has been utilized by organizations and projects in various application domains including archeology, linguistics, ancient texts, calligraphy and paintings, and stone and bronze rubbings. For example, in Academia Sinica, a very large full-text database of ancient texts called Scripta Sinica has been created using this approach. The Union Catalog of National Digital Archives Project (NDAP) dealt with the unencoded characters encountered when merging the metadata of 12 different thematic domains from various organizations. Also, in Bronze Inscriptions Research Team (BIRT) of Academia Sinica, 3,459 Bronze Inscriptions were added, which is very helpful to the education and research in historic linguistics.