Keyword spotting for self-training of BLSTM NN based handwriting recognition systems

Authors:
Volkmar Frinken;Andreas Fischer;Markus Baumgartner;Horst Bunke
Affiliations:
-;-;-;-
Venue:
Pattern Recognition
Year:
2014

Citing 27
Cited 0

A theory of the learnable

Communications of the ACM
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unlabeled Data Can Degrade Classification Performance of Generative Classifiers

Proceedings of the Fifteenth International Florida Artificial Intelligence Research Society Conference
Exploitation of Unlabeled Sequences in Hidden Markov Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

ICML '06 Proceedings of the 23rd international conference on Machine learning
Semisupervised Learning of Hidden Markov Models via a Homotopy Method

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Novel Connectionist System for Unconstrained Handwriting Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Self-training Strategies for Handwriting Word Recognition

ICDM '09 Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects
Writer Adaptive Training and Writing Variant Model Refinement for Offline Arabic Handwriting Recognition

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Semi-supervised Learning for Handwriting Recognition

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Language Model Integration for the Recognition of Handwritten Medieval Documents

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Automatic Transcription of Handwritten Medieval Documents

VSMM '09 Proceedings of the 2009 15th International Conference on Virtual Systems and Multimedia
Active learning and semi-supervised learning for speech recognition: A unified framework using the global entropy reduction maximization criterion

Computer Speech and Language
HMM-based Word Spotting in Handwritten Documents Using Subword Models

ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Self-training for handwritten text line recognition

CIARP'10 Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications
Multimodal Interactive Pattern Recognition and Applications

Multimodal Interactive Pattern Recognition and Applications
Co-training for Handwritten Word Recognition

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
A Novel Word Spotting Method Based on Recurrent Neural Networks

IEEE Transactions on Pattern Analysis and Machine Intelligence
Lexicon-free handwritten word spotting using character HMMs

Pattern Recognition Letters
Combining neural networks to improve performance of handwritten keyword spotting

MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems
Probability of error of some adaptive pattern-recognition machines

IEEE Transactions on Information Theory
Learning to recognize patterns without a teacher

IEEE Transactions on Information Theory
Learning with a probabilistic teacher

IEEE Transactions on Information Theory
The ESPOSALLES database: An ancient marriage license corpus for off-line handwriting recognition

Pattern Recognition
Semi-supervised Learning for Cursive Handwriting Recognition Using Keyword Spotting

ICFHR '12 Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

The automatic transcription of unconstrained continuous handwritten text requires well trained recognition systems. The semi-supervised paradigm introduces the concept of not only using labeled data but also unlabeled data in the learning process. Unlabeled data can be gathered at little or not cost. Hence it has the potential to reduce the need for labeling training data, a tedious and costly process. Given a weak initial recognizer trained on labeled data, self-training can be used to recognize unlabeled data and add words that were recognized with high confidence to the training set for re-training. This process is not trivial and requires great care as far as selecting the elements that are to be added to the training set is concerned. In this paper, we propose to use a bidirectional long short-term memory neural network handwritten recognition system for keyword spotting in order to select new elements. A set of experiments shows the high potential of self-training for bootstrapping handwriting recognition systems, both for modern and historical handwritings, and demonstrate the benefits of using keyword spotting over previously published self-training schemes.