Similarity-based training set acquisition for continuous handwriting recognition

  • Authors:
  • Jerzy Sas;Urszula Markowska-Kaczmar

  • Affiliations:
  • Wroclaw University of Technology, Wyb. Wyspianskiego 27, 50-370 Wroclaw, Poland;Wroclaw University of Technology, Wyb. Wyspianskiego 27, 50-370 Wroclaw, Poland

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 0.07

Visualization

Abstract

In the paper we consider the problem of continuous handwriting segmentation into individual characters. The ultimate aim is to create the set of isolated character images used as a training set for the writer-dependent handwriting recognizer. Analytic approach is applied, where word recognition is based on the individual classification of characters. The input to the proposed segmentation method is a handwritten text image consisting of known words. The method consists of three stages. Initially, images of isolated words are over-segmented into sequences of graphemes. At the first stage the genetic algorithm is used to create the set of segmentation variants that are likely to correspond to actual characters. The fitness function is based on the similarity of images within subsets of images of the same character. At the second stage, the set of segmentation variants elicited as the last generation of the genetic algorithm is refined by applying a sequence of subtle segment boundary displacements that increase the similarity of images within sets of the same characters. In the third stage the most typical character prototypes are selected and fixed in word images. The segmentation of remaining words fragments is achieved by maximizing the similarity to the fixed character prototypes. The accuracy of handwritten text recognition with the acquired character images after each stage was experimentally evaluated. Experiments with continuous handwriting recognition show that application of each stage improves the word recognition accuracy.