HIT-OR3C: an opening recognition corpus for Chinese characters

Authors:
Shusen Zhou;Qingcai Chen;Xiaolong Wang
Affiliations:
Harbin Institute of Technology, Shenzhen, P.R. China;Harbin Institute of Technology, Shenzhen, P.R. China;Harbin Institute of Technology, Shenzhen, P.R. China
Venue:
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Year:
2010

Citing 11
Cited 0

On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Database for Handwritten Text Recognition Research

IEEE Transactions on Pattern Analysis and Machine Intelligence
On-line Handwritten Character Pattern Database Sampled in a Sequence of Sentences without any Writing Instructions

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
The IRESTE On/Off (IRONOFF) Dual Handwriting Database

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Collection and Analysis of On-line Handwritten Japanese Character Patterns

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Online Recognition of Chinese Characters: The State-of-the-Art

IEEE Transactions on Pattern Analysis and Machine Intelligence
Databases for Research on Recognition of Handwritten Characters of Indian Scripts

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text

International Journal on Document Analysis and Recognition
CASIA-OLHWDB1: A Database of Online Handwritten Chinese Characters

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
HCL2000 - A Large-scale Handwritten Chinese Character Database for Handwritten Character Recognition

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Databases and competitions: strategies to improve Arabic recognition systems

SACH'06 Proceedings of the 2006 conference on Arabic and Chinese handwriting recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes an opening recognition corpus, HIT-OR3C, and its construction toolkit to facilitate the unconstrained online Chinese handwriting text recognition. The characters of HIT-OR3C are collected through handwriting pad and are recorded and labeled automatically via the proposed handwriting document collection software OR3C Toolkit. HIT-OR3C consists of 5 subsets, namely GB1, GB2, Letter, Digit and Document. The first 4 corpora contain 6,825 categories produced by 122 persons and 832,650 samples in total. The document corpus is corresponding to 10 news articles that contain 2,442 categories produced by 20 persons and 77,168 samples in total. HIT-OR3C can be used for training and evaluation of character recognition algorithms. The OR3C Toolkit provides an efficient, device-independent, and unconstrained platform for the building of large scale handwriting corpus.