Self-training Strategies for Handwriting Word Recognition

Authors:
Volkmar Frinken;Horst Bunke
Affiliations:
Institute of Computer Science and Applied Mathematics, University of Bern, Bern, Switzerland CH-3012;Institute of Computer Science and Applied Mathematics, University of Bern, Bern, Switzerland CH-3012
Venue:
ICDM '09 Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects
Year:
2009

Citing 12
Cited 3

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Recognition of Cursive Roman Handwriting - Past, Present and Future

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Exploitation of Unlabeled Sequences in Hidden Markov Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast Handwriting Recognition for Indexing Historical Documents

DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

ICML '06 Proceedings of the 23rd international conference on Machine learning
Rejection strategies for offline handwritten text line recognition

Pattern Recognition Letters
Learning to Group Text Lines and Regions in Freeform Handwritten Notes

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 01
Semisupervised Learning of Hidden Markov Models via a Homotopy Method

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Novel Connectionist System for Unconstrained Handwriting Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Evaluating Retraining Rules for Semi-Supervised Learning in Neural Network Based Cursive Word Recognition

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Semi-Supervised Learning

Semi-Supervised Learning

Self-training for handwritten text line recognition

CIARP'10 Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications
Keyword spotting for self-training of BLSTM NN based handwriting recognition systems

Pattern Recognition
Pattern classification and clustering: A review of partially supervised learning approaches

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Handwriting recognition is an emerging subfield of human-computer interaction that has many potential industrial applications, e.g. in postal automation, bank check processing, and automatic form reading. Training a recognizer, however, requires a substantial amount of training examples together with their corresponding ground truth, which needs to be created by humans. A promising way to significantly reduce this effort, and hence cut system development costs, is offered by semi-supervised learning, in which both text with and text without transcription is used for training. However, until today there is no straightforward and established way of semi-supervised learning, particularly not for handwriting recognition. In the self-training approach, an initially trained recognition system creates a new training set from unlabeled data. Using this set, a new recognizer is created. The creation of the training set is done by selecting elements from the unlabeled set, according to their recognition confidence. The success of self-training depends crucially on the data selected. In this paper, we test and compare different rules used to select new training data for single word recognition with and without additional language information in the form of a dictionary. We demonstrate that it is possible to substantially increase the recognition accuracy for both systems.