The String-to-String Correction Problem
Journal of the ACM (JACM)
Transcript Mapping for Historic Handwritten Document Images
IWFHR '02 Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR'02)
A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Special issue on the analysis of historical documents
International Journal on Document Analysis and Recognition
Further explorations in text alignment with handwritten documents
International Journal on Document Analysis and Recognition
On-Line Handwritten Text Line Detection Using Dynamic Programming
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 01
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Combining Alignment Results for Historical Handwritten Document Analysis
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Multimodal interactive transcription of text images
Pattern Recognition
Ground truth creation for handwriting recognition in historical documents
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Medieval manuscript layout model
Proceedings of the 10th ACM symposium on Document engineering
ICFHR '10 Proceedings of the 2010 12th International Conference on Frontiers in Handwriting Recognition
Aligning transcripts to automatically segmented handwritten manuscripts
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Unsupervised profiling of OCRed historical documents
Pattern Recognition
Generation of learning samples for historical handwriting recognition using image degradation
Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
Hi-index | 0.00 |
Transcriptions of historical documents are a valuable source for extracting labeled handwriting images that can be used for training recognition systems. In this paper, we introduce the Saint Gall database that includes images as well as the transcription of a Latin manuscript from the 9th century written in Carolingian script. Although the available transcription is of high quality for a human reader, the spelling of the words is not accurate when compared with the handwriting image. Hence, the transcription poses several challenges for alignment regarding, e.g., line breaks, abbreviations, and capitalization. We propose an alignment system based on character Hidden Markov Models that can cope with these challenges and efficiently aligns complete document pages. On the Saint Gall database, we demonstrate that a considerable alignment accuracy can be achieved, even with weakly trained character models.