Annotating Handwritten Characters with Minimal Human Involvement in a Semi-supervised Learning Strategy

  • Authors:
  • Jan Richarz;Szilard Vajda;Gernot A. Fink

  • Affiliations:
  • -;-;-

  • Venue:
  • ICFHR '12 Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

One obstacle in the automatic analysis of handwritten documents is the huge amount of labeled data typically needed for classifier training. This is especially true when the document scans are of bad quality and different writers and writing styles have to be covered. Consequently, the considerable human effort required in the process currently prohibits the automatic transcription of large document collections. In this paper, two semi-supervised multiview learning approaches are presented, reducing the manual burden by robustly deriving a large number of labels from relatively few manual annotations. The first is based on cluster-level annotation followed by a majority decision, whereas the second casts the labeling process as a retrieval task and derives labels by voting among ranked lists. Both methods are thoroughly evaluated in a handwritten character recognition scenario using realistic document data. It is demonstrated that competitive recognition performance can be maintained by labeling only a fraction of the data.