Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text

  • Authors:
  • Tonghua Su;Tianwen Zhang;Dejun Guan

  • Affiliations:
  • Harbin Institute of Technology, School of Computer Science and Technology, Harbin, China;Harbin Institute of Technology, School of Computer Science and Technology, Harbin, China;Harbin Engineering University, School of Computer Science and Technology, Harbin, China

  • Venue:
  • International Journal on Document Analysis and Recognition
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

A Chinese handwriting database named HIT-MW is presented to facilitate the offline Chinese handwritten text recognition. Both the writers and the texts for handcopying are carefully sampled with a systematic scheme. To collect naturally written handwriting, forms are distributed by postal mail or middleman instead of face to face. The current version of HIT-MW includes 853 forms and 186,444 characters that are produced under an unconstrained condition without preprinted character boxes. The statistics show that the database has an excellent representation of the real handwriting. Many new applications concerning real handwriting recognition can be supported by the database.