Finding words in alphabet soup: Inference on freeform character recognition for historical scripts

Authors:
Nicholas R. Howe;Shaolei Feng;R. Manmatha
Affiliations:
Department of Computer Science, Smith College, Northampton, MA 01063, USA;Department of Computer Science, University of Massachusetts, Amherst, MA 01003, USA;Department of Computer Science, University of Massachusetts, Amherst, MA 01003, USA
Venue:
Pattern Recognition
Year:
2009

Citing 22
Cited 3

Statistical methods for speech recognition

Statistical methods for speech recognition
On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Role of Holistic Paradigms in Handwritten Word Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Holistic Word Recognition for Handwritten Historical Documents

DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
A search engine for historical manuscript images

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Boosted decision trees for word recognition in handwritten document retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Data Driven Image Models through Continuous Joint Alignment

IEEE Transactions on Pattern Analysis and Machine Intelligence
Performance Improvements to the BBN Byblos OCR System

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
A Comparison of Binarization Methods for Historical Archive Documents

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Exploring the Use of Conditional Random Field Models and HMMs for Historical Handwritten Document Recognition

DIAL '06 Proceedings of the Second International Conference on Document Image Analysis for Libraries
A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Word matching using single closed contours for indexing handwritten historical documents

International Journal on Document Analysis and Recognition
Word spotting for historical documents

International Journal on Document Analysis and Recognition
An old greek handwritten OCR system based on an efficient segmentation-free approach

International Journal on Document Analysis and Recognition
Binarization, character extraction, and writer identification of historical Hebrew calligraphy documents

International Journal on Document Analysis and Recognition
Boosted Classification Trees and Class Probability/Quantile Estimation

The Journal of Machine Learning Research
Sharing Visual Features for Multiclass and Multiview Object Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical models for text query-based image retrieval

Statistical models for text query-based image retrieval
Recognizing objects in adversarial clutter: breaking a visual captcha

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition

A synthesised word approach to word retrieval in handwritten documents

Pattern Recognition
A comparison of machine learning techniques for handwritten |Xam word recognition

Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference
Keyword spotting in unconstrained handwritten Chinese documents using contextual word model

Image and Vision Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper develops word recognition methods for historical handwritten cursive and printed documents. It employs a powerful segmentation-free letter detection method based upon joint boosting with histograms of gradients as features. Efficient inference on an ensemble of hidden Markov models can select the most probable sequence of candidate character detections to recognize complete words in ambiguous handwritten text, drawing on character n-gram and physical separation models. Experiments with two corpora of handwritten historic documents show that this approach recognizes known words more accurately than previous efforts, and can also recognize out-of-vocabulary words.