Evaluating Retraining Rules for Semi-Supervised Learning in Neural Network Based Cursive Word Recognition

Authors:
Volkmar Frinken;Horst Bunke
Affiliations:
-;-
Venue:
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Year:
2009

Citing 0
Cited 4

Self-training Strategies for Handwriting Word Recognition

ICDM '09 Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects
Self-training for handwritten text line recognition

CIARP'10 Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications
Transcription alignment of Latin manuscripts using hidden Markov models

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Pattern classification and clustering: A review of partially supervised learning approaches

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Training a system to recognize handwritten words is a task that requires a large amount of data with their correct transcription. However, the creation of such a training set, including the generation of the ground truth, is tedious and costly. One way of reducing the high cost of labeled training data acquisition is to exploit unlabeled data, which can be gathered easily. Making use of both labeled and unlabeled data is known as semi-supervised learning. One of the most general versions of semi-supervised learning is self-training, where a recognizer iteratively retrains itself on its own output on new, unlabeled data. In this paper we propose to apply semi-supervised learning, and in particular self-training, to the problem of cursive, handwritten word recognition. The special focus of the paper is on retraining rules that define what data are actually being used in the retraining phase. In a series of experiments it is shown that the performance of a neural network based recognizer can be significantly improved through the use of unlabeled data and self-training if appropriate retraining rules are applied.