Word-Based Adaptive OCR for Historical Books

Authors:
Vladimir Kluzner;Asaf Tzadok;Yuval Shimony;Eugene Walach;Apostolos Antonacopoulos
Affiliations:
-;-;-;-;-
Venue:
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Year:
2009

Citing 0
Cited 5

Introducing a new image dissimilarity measure with an application to character image clustering in degraded historical documents

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Document: a useful level for facing noisy data

AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Data mining medieval documents by word spotting

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
IMPACT: centre of competence in text digitisation

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
An experimental workflow development platform for historical document digitisation and analysis

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The aim of this work is to propose a new approach to the recognition of historical texts by providing an adaptive mechanism that automatically tunes itself to a specific book. The system is based on clustering together all the similar words in a book/text and simultaneously handling entire class. The paper describes the architecture of such a system and new algorithms that have been developed for robust word image comparison (including registration, optical flow based distortion compensation, and adaptive binarization). Results for a large dataset are presented as well. Over 23% recognition improvement is demonstrated.