Empirical Performance Evaluation of Graphics Recognition Systems
IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Segmentation o the IAM Off-Line Database orHandwrittenEnglishText
ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Text Alignment with Handwritten Documents
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Holistic Word Recognition for Handwritten Historical Documents
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
A linear-time component-labeling algorithm using contour tracing technique
Computer Vision and Image Understanding
Adaptive degraded document image binarization
Pattern Recognition
Word matching using single closed contours for indexing handwritten historical documents
International Journal on Document Analysis and Recognition
Word spotting for historical documents
International Journal on Document Analysis and Recognition
An old greek handwritten OCR system based on an efficient segmentation-free approach
International Journal on Document Analysis and Recognition
Keyword-guided word spotting in historical printed documents using synthetic data and user feedback
International Journal on Document Analysis and Recognition
Handwriting Segmentation Contest
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Text line detection in handwritten documents
Pattern Recognition
A Complete Optical Character Recognition Methodology for Historical Documents
DAS '08 Proceedings of the 2008 The Eighth IAPR International Workshop on Document Analysis Systems
Image processing for historical newspaper archives
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Ocropodium: open source OCR for small-scale historical archives
Journal of Information Science
Hi-index | 0.00 |
In this paper, we propose a new comprehensive methodology in order to evaluate the performance of noisy historical document recognition techniques. We aim to evaluate not only the final noisy recognition result but also the main intermediate stages of text line, word and character segmentation. For this purpose, we efficiently create the text line, word and character segmentation ground truth guided by the transcription of the historical documents. The proposed methodology consists of (i) a semiautomatic procedure in order to detect the text line, word and character segmentation ground truth regions making use of the correct document transcription, (ii) calculation of proper evaluation metrics in order to measure the performance of the final OCR result as well as of the intermediate segmentation stages. The semi-automatic procedure for detecting the ground truth regions has been evaluated and proved efficient and time saving. Experimental results prove that using the proposed technique, the percentage of time saved for the text line, word and character segmentation ground truth creation is more than 90%. An analytic experiment using a commercial OCR engine applied to a historical book is also presented.