SCUT-COUCH2009—a comprehensive online unconstrained Chinese handwriting database and benchmark evaluation

  • Authors:
  • Lianwen Jin;Yan Gao;Gang Liu;Yunyang Li;Kai Ding

  • Affiliations:
  • South China University of Technology, School of Electronic and Information Engineering, 381 Wushan Road, Guangzhou, Guangdong, China;South China University of Technology, School of Electronic and Information Engineering, 381 Wushan Road, Guangzhou, Guangdong, China;South China University of Technology, School of Electronic and Information Engineering, 381 Wushan Road, Guangzhou, Guangdong, China;South China University of Technology, School of Electronic and Information Engineering, 381 Wushan Road, Guangzhou, Guangdong, China;South China University of Technology, School of Electronic and Information Engineering, 381 Wushan Road, Guangzhou, Guangdong, China

  • Venue:
  • International Journal on Document Analysis and Recognition - Special Issue on Performance Evaluation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A comprehensive online unconstrained Chinese handwriting dataset, SCUT-COUCH2009, is introduced in this paper. As a revision of SCUT-COUCH2008 [1], the SCUT-COUCH2009 database consists of more datasets with larger vocabularies and more writers. The database is built to facilitate the research of unconstrained online Chinese handwriting recognition. It is comprehensive in the sense that it consists of 11 datasets of different vocabularies, named GB1, GB2, TradGB1, Big5, Pinyin, Letters, Digit, Symbol, Word8888, Word17366 and Word44208. In particular, the SCUT-COUCH2009 database contains handwritten samples of 6,763 single Chinese characters in the GB2312-80 standard, 5,401 traditional Chinese characters of the Big5 standard, 1,384 traditional Chinese characters corresponding to the level 1 characters of the GB2312-80 standard, 8,888 frequently used Chinese words, 17,366 daily-used Chinese words, 44,208 complete words from the Fourth Edition of “The Contemporary Chinese Dictionary”, 2,010 Pinyin and 184 daily-used symbols. The samples were collected using PDAs (Personal Digit Assistant) and smart phones with touch screens and were contributed by more than 190 persons. The total number of character samples is over 3.6 million. The SCUT-COUCH2009 database is the first publicly available large vocabulary online Chinese handwriting database containing multi-type character/word samples. We report some evaluation results on the database using state-of-the-art recognizers for benchmarking.