Handwritten word-spotting using hidden Markov models and universal vocabularies

Authors:
José A. Rodríguez-Serrano;Florent Perronnin
Affiliations:
Centre de Visió Per Computador (CVC), Universitat Autònoma de Barcelona, Edifici O Campus Bellaterra, 08193 Bellaterra, Spain;Xerox Research Centre Europe, 6 Chemin de Maupertuis, 38240 Meylan, France
Venue:
Pattern Recognition
Year:
2009

Citing 31
Cited 10

Semi-continuous hidden Markov models for speech signals

Readings in speech recognition
Handwritten Word Recognition Using Segmentation-Free Hidden Markov Modeling and Segmentation-Based Dynamic Programming Techniques

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Off-Line Cursive Handwriting Recognition System

IEEE Transactions on Pattern Analysis and Machine Intelligence
Hidden Markov model based word recognition and its application to legal amount reading on French checks

Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
An HMM-Based Approach for Off-Line Unconstrained Handwritten Word Modeling and Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Statistical Approach for Phrase Location and Recognition within a Text Line: An Application to Street Name Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Off-Line Handwritten Word Recognition Using a Hidden Markov Model Type Stochastic Network

IEEE Transactions on Pattern Analysis and Machine Intelligence
Keyword Spotting in Poorly Printed Documents using Pseudo 2-D Hidden Markov Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Word Spotting: A New Approach to Indexing Handwriting

CVPR '96 Proceedings of the 1996 Conference on Computer Vision and Pattern Recognition (CVPR '96)
Gap metrics for word separation in handwritten lines

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Transcript Mapping for Historic Handwritten Document Images

IWFHR '02 Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR'02)
Word Segmentation in Handwritten Korean Text Lines Based on Gap Clustering Techniques

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Recognition of Cursive Roman Handwriting - Past, Present and Future

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Confidence Measures for an Address Reading System

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Optimizing the Number of States, Training Iterations and Gaussians in an HMM-based Handwritten Word Recognizer

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Print keyword spotting with dynamically synthesized pseudo 2D HMMs

Pattern Recognition Letters
Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds

IEEE Transactions on Pattern Analysis and Machine Intelligence
Eigenspace Method for Text Retrieval in Historical Document Images

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Omnilingual Segmentation-freeWord Spotting for Ancient Manuscripts Indexation

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Offline Grammar-Based Recognition of Handwritten Sentences

IEEE Transactions on Pattern Analysis and Machine Intelligence
Searching Off-line Arabic Documents

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Text line segmentation of historical documents: a survey

International Journal on Document Analysis and Recognition
Word matching using single closed contours for indexing handwritten historical documents

International Journal on Document Analysis and Recognition
Word spotting for historical documents

International Journal on Document Analysis and Recognition
Markov Models for Pattern Recognition: From Theory to Applications

Markov Models for Pattern Recognition: From Theory to Applications
Word spotting in scanned images using hidden Markov models

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: image and multidimensional signal processing - Volume V
Content-based retrieval of historical Ottoman documents stored as textual images

IEEE Transactions on Image Processing

Unsupervised writer adaptation of whole-word HMMs with application to word-spotting

Pattern Recognition Letters
A line-based representation for matching words in historical manuscripts

Pattern Recognition Letters
Improving handwritten keyword spotting with self-training

Proceedings of the 2011 ACM Symposium on Applied Computing
Lexicon-free handwritten word spotting using character HMMs

Pattern Recognition Letters
Synthesizing queries for handwritten word image retrieval

Pattern Recognition
Contextual word spotting in historical manuscripts using Markov logic networks

Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
Learning-based word spotting system for Arabic handwritten documents

Pattern Recognition
Statistical script independent word spotting in offline handwritten documents

Pattern Recognition
Boosting the handwritten word spotting experience by including the user in the loop

Pattern Recognition
Keyword spotting in unconstrained handwritten Chinese documents using contextual word model

Image and Vision Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Handwritten word-spotting is traditionally viewed as an image matching task between one or multiple query word-images and a set of candidate word-images in a database. This is a typical instance of the query-by-example paradigm. In this article, we introduce a statistical framework for the word-spotting problem which employs hidden Markov models (HMMs) to model keywords and a Gaussian mixture model (GMM) for score normalization. We explore the use of two types of HMMs for the word modeling part: continuous HMMs (C-HMMs) and semi-continuous HMMs (SC-HMMs), i.e. HMMs with a shared set of Gaussians. We show on a challenging multi-writer corpus that the proposed statistical framework is always superior to a traditional matching system which uses dynamic time warping (DTW) for word-image distance computation. A very important finding is that the SC-HMM is superior when labeled training data is scarce-as low as one sample per keyword-thanks to the prior information which can be incorporated in the shared set of Gaussians.