Semi-automatic training sets acquisition for handwriting recognition

  • Authors:
  • Jerzy Sas;Urszula Markowska-Kaczmar

  • Affiliations:
  • Wroclaw University of Technology, Applied Informatics Institute, Wroclaw, Poland;Wroclaw University of Technology, Applied Informatics Institute, Wroclaw, Poland

  • Venue:
  • CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, a method of semi-automatic training set acquisition for character classifiers used in cursive handwriting recognition is described. The training set consists of character samples extracted from a training corpus by segmentation. The method first splits the word images from the corpus into a sequence of graphemes. Then, the set of candidate segmentation variants is elicited with an evolutionary algorithm, where the segmentation variant determines subdivision of grapheme sequences of words into subsequences corresponding to consecutive letters. Segmentation variants are modeled by a chromosome population. Next, each segmentation variant from the final population is tuned in an iterative process and the best chromosome is selected. Then character samples resulting from application of the segmentation modeled by the selected chromosome are grouped into sets corresponding to letters from the alphabet. Finally, the most outstanding samples are rejected so as to maximize the accuracy of words recognition obtained with a character classifier trained with the reduced samples set.