Prototype Extraction and Adaptive OCR
IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning to Recognize Speech by Watching Television
IEEE Intelligent Systems
Scale Space Technique for Word Segmentation in Handwritten Documents
SCALE-SPACE '99 Proceedings of the Second International Conference on Scale-Space Theories in Computer Vision
Speaker Identification Based Text to Audio Alignment for an Audio Retrieval System
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
A Full English Sentence Database for Off-Line Handwriting Recognition
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Transcript Mapping for Historic Handwritten Document Images
IWFHR '02 Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR'02)
Text Alignment with Handwritten Documents
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Holistic Word Recognition for Handwritten Historical Documents
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Computational Linguistics - Special issue on using large corpora: I
Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
A search engine for historical manuscript images
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A Scale Space Approach for Automatically Segmenting Words from Historical Handwritten Documents
IEEE Transactions on Pattern Analysis and Machine Intelligence
HMM word and phrase alignment for statistical machine translation
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Matching ottoman words: an image retrieval approach to historical document indexing
Proceedings of the 6th ACM international conference on Image and video retrieval
A line-based representation for matching words in historical manuscripts
Pattern Recognition Letters
User-assisted alignment of Arabic historical manuscripts
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Transcription alignment of Latin manuscripts using hidden Markov models
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Hi-index | 0.00 |
Training and evaluation of techniques for handwriting recognition and retrieval is a challenge given that it is difficult to create large ground-truthed datasets. This is especially true for historical handwritten datasets. In many instances the ground truth has to be created by manually transcribing each word, which is a very labor intensive process. Sometimes transcriptions are available for some manuscripts. These transcriptions were created for other purposes and hence correspondence at the word, line, or sentence level may not be available. To be useful for training and evaluation, a word level correspondence must be available between the segmented handwritten word images and the ASCII transcriptions. Creating this correspondence or alignment is challenging because the segmentation is often errorful and the ASCII transcription may also have errors in it. Very little work has been done on the alignment of handwritten data to transcripts. Here, a novel Hidden Markov Model based automatic alignment algorithm is described and tested. The algorithm produces an average alignment accuracy of about 72.8% when aligning whole pages at a time on a set of 70 pages of the George Washington collection. This outperforms a dynamic time warping alignment algorithm by about 12% previously reported in the literature and tested on the same collection.