Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
A design of a preprocessing framework for large database of historical documents
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Hi-index | 0.00 |
This paper presents a system for automatic annotation of handwritten historical documents based on Markov models. The proposed system first extracts XML schema which describes a specific domain and than a Mapping algorithm is used for the generation of the new XML schemes. Mapping algorithm has as inputs two schemes reference schema and a specific schema. XML schemes are generated using Markov models, this model is used to calculate the Mapping efficiency. In the first model the Mapping increased according to the common number of nodes between the entries XML schemes. Mapping is pertinent when the common nodes number is over $0.5\%$ of Markov model states. In the second model the Mapping changes randomly according to the in common number of nodes between $0.05$ and $0.4$.