Semi-automatic training sets acquisition for handwriting recognition

Authors:
Jerzy Sas;Urszula Markowska-Kaczmar
Affiliations:
Wroclaw University of Technology, Applied Informatics Institute, Wroclaw, Poland;Wroclaw University of Technology, Applied Informatics Institute, Wroclaw, Poland
Venue:
CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
Year:
2007

Citing 7
Cited 2

Writer Adaptation for Online Handwriting Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
OmeGA: A Competent Genetic Algorithm for Solving Permutation and Scheduling Problems

OmeGA: A Competent Genetic Algorithm for Solving Permutation and Scheduling Problems
Optical Character Recognition for Cursive Handwriting

IEEE Transactions on Pattern Analysis and Machine Intelligence
Continuous Approach to Segmentation of Handwritten Text

IWFHR '02 Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR'02)
Probabilistic Model for Segmentation Based Word Recognition with Lexicon

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
A genetic framework using contextual knowledge for segmentation and recognition of handwritten numeral strings

Pattern Recognition
SegGen: a genetic algorithm for linear text segmentation

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Similarity-based training set acquisition for continuous handwriting recognition

Information Sciences: an International Journal
Semi-supervised learning for character recognition in historical archive documents

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a method of semi-automatic training set acquisition for character classifiers used in cursive handwriting recognition is described. The training set consists of character samples extracted from a training corpus by segmentation. The method first splits the word images from the corpus into a sequence of graphemes. Then, the set of candidate segmentation variants is elicited with an evolutionary algorithm, where the segmentation variant determines subdivision of grapheme sequences of words into subsequences corresponding to consecutive letters. Segmentation variants are modeled by a chromosome population. Next, each segmentation variant from the final population is tuned in an iterative process and the best chromosome is selected. Then character samples resulting from application of the segmentation modeled by the selected chromosome are grouped into sets corresponding to letters from the alphabet. Finally, the most outstanding samples are rejected so as to maximize the accuracy of words recognition obtained with a character classifier trained with the reduced samples set.