Language Model Integration for the Recognition of Handwritten Medieval Documents

Authors:
Markus Wuthrich;Marcus Liwicki;Andreas Fischer;Emanuel Indermuhle;Horst Bunke;Gabriel Viehhauser;Michael Stolz
Affiliations:
-;-;-;-;-;-;-
Venue:
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Year:
2009

Citing 0
Cited 6

Ground truth creation for handwriting recognition in historical documents

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
W-TSV: Weighted topological signature vector for lexicon reduction in handwritten Arabic documents

Pattern Recognition
The ESPOSALLES database: An ancient marriage license corpus for off-line handwriting recognition

Pattern Recognition
Handwriting recognition in historical documents using very large vocabularies

Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
Generation of learning samples for historical handwriting recognition using image degradation

Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
Keyword spotting for self-training of BLSTM NN based handwriting recognition systems

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Building recognition systems for historical documents is a difficult task. Especially, when it comes to medieval scripts. The complexity is mainly affected by the poor quality and the small quantity of the data available. In this paper we apply an HMM based recognition system to medieval manuscripts from the 13th century written in Middle High German. The recognition system, which was originally developed for modern scripts, has been adapted to medieval scripts. Beside the data processing, one of the major challenges is to create a suitable language model. Because of the lack of appropriate independent text corpora for medieval languages, the language model has to be created on the base of a rather small number of manuscripts only. Due to the small size of the corpus, optimizing the language model parameters can quickly lead to the problem of overfitting. In this paper we describe a strategy to integrate all available information into the language model and to optimize the language model parameters without suffering from this problem.