Improving handwritten keyword spotting with self-training

Authors:
Volkmar Frinken;Andreas Fischer;Horst Bunke
Affiliations:
Institute for Computer Science and Applied Mathematics, Bern, Switzerland;Institute for Computer Science and Applied Mathematics, Bern, Switzerland;Institute for Computer Science and Applied Mathematics, Bern, Switzerland
Venue:
Proceedings of the 2011 ACM Symposium on Applied Computing
Year:
2011

Citing 10
Cited 0

On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

ICML '06 Proceedings of the 23rd international conference on Machine learning
Matching ottoman words: an image retrieval approach to historical document indexing

Proceedings of the 6th ACM international conference on Image and video retrieval
Text search for medieval manuscript images

Pattern Recognition
A Novel Connectionist System for Unconstrained Handwriting Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Handwritten word-spotting using hidden Markov models and universal vocabularies

Pattern Recognition
Automatic Transcription of Handwritten Medieval Documents

VSMM '09 Proceedings of the 2009 15th International Conference on Virtual Systems and Multimedia
Semi-Supervised Learning

Semi-Supervised Learning
A novel word spotting algorithm using bidirectional long short-term memory neural networks

ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Keyword spotting is the task of retrieving all instances of a given keyword in a set of documents. In the current paper we consider the problem of keyword spotting in handwritten text. This is a difficult problem due to the great variety of different writing styles. Recently, learning based keyword spotting systems have been shown to outperform traditional approaches, at the cost of requiring large amounts of training data. The training data need to be manually labeled, which is tedious and time-consuming. In this paper we propose to exploit unlabeled data via semi-supervised learning to reduce the need for labeled data when training a keyword spotting system. We demonstrate, on historic as well as modern handwritten text, that the performance of a learning based keyword spotting system can be dramatically increased using this approach.