Self-training Strategies for Handwriting Word Recognition

  • Authors:
  • Volkmar Frinken;Horst Bunke

  • Affiliations:
  • Institute of Computer Science and Applied Mathematics, University of Bern, Bern, Switzerland CH-3012;Institute of Computer Science and Applied Mathematics, University of Bern, Bern, Switzerland CH-3012

  • Venue:
  • ICDM '09 Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Handwriting recognition is an emerging subfield of human-computer interaction that has many potential industrial applications, e.g. in postal automation, bank check processing, and automatic form reading. Training a recognizer, however, requires a substantial amount of training examples together with their corresponding ground truth, which needs to be created by humans. A promising way to significantly reduce this effort, and hence cut system development costs, is offered by semi-supervised learning, in which both text with and text without transcription is used for training. However, until today there is no straightforward and established way of semi-supervised learning, particularly not for handwriting recognition. In the self-training approach, an initially trained recognition system creates a new training set from unlabeled data. Using this set, a new recognizer is created. The creation of the training set is done by selecting elements from the unlabeled set, according to their recognition confidence. The success of self-training depends crucially on the data selected. In this paper, we test and compare different rules used to select new training data for single word recognition with and without additional language information in the form of a dictionary. We demonstrate that it is possible to substantially increase the recognition accuracy for both systems.