HIT-OR3C: an opening recognition corpus for Chinese characters

  • Authors:
  • Shusen Zhou;Qingcai Chen;Xiaolong Wang

  • Affiliations:
  • Harbin Institute of Technology, Shenzhen, P.R. China;Harbin Institute of Technology, Shenzhen, P.R. China;Harbin Institute of Technology, Shenzhen, P.R. China

  • Venue:
  • DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes an opening recognition corpus, HIT-OR3C, and its construction toolkit to facilitate the unconstrained online Chinese handwriting text recognition. The characters of HIT-OR3C are collected through handwriting pad and are recorded and labeled automatically via the proposed handwriting document collection software OR3C Toolkit. HIT-OR3C consists of 5 subsets, namely GB1, GB2, Letter, Digit and Document. The first 4 corpora contain 6,825 categories produced by 122 persons and 832,650 samples in total. The document corpus is corresponding to 10 news articles that contain 2,442 categories produced by 20 persons and 77,168 samples in total. HIT-OR3C can be used for training and evaluation of character recognition algorithms. The OR3C Toolkit provides an efficient, device-independent, and unconstrained platform for the building of large scale handwriting corpus.