Special issue on the analysis of historical documents
International Journal on Document Analysis and Recognition
A Novel Connectionist System for Unconstrained Handwriting Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Language Model Integration for the Recognition of Handwritten Medieval Documents
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Automatic Transcription of Handwritten Medieval Documents
VSMM '09 Proceedings of the 2009 15th International Conference on Virtual Systems and Multimedia
Ground truth creation for handwriting recognition in historical documents
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Multimodal Interactive Pattern Recognition and Applications
Multimodal Interactive Pattern Recognition and Applications
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Historical Document Imaging and Processing
A Novel Word Spotting Method Based on Recurrent Neural Networks
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hi-index | 0.00 |
Language models are used in automatic transcription system to resolve ambiguities. This is done by limiting the vocabulary of words that can be recognized as well as estimating the n-gram probability of the words in the given text. In the context of historical documents, a non-unified spelling and the limited amount of written text pose a substantial problem for the selection of the recognizable vocabulary as well as the computation of the word probabilities. In this paper we propose for the transcription of historical Spanish text to keep the corpus for the n-gram limited to a sample of the target text, but expand the vocabulary with words gathered from external resources. We analyze the performance of such a transcription system with different sizes of external vocabularies and demonstrate the applicability and the significant increase in recognition accuracy of using up to 300 thousand external words.