IEEE Transactions on Pattern Analysis and Machine Intelligence
Correcting broken characters in the recognition of historical printed documents
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Line Separation for Complex Document Images Using Fuzzy Runlength
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
An old greek handwritten OCR system based on an efficient segmentation-free approach
International Journal on Document Analysis and Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Complete Optical Character Recognition Methodology for Historical Documents
DAS '08 Proceedings of the 2008 The Eighth IAPR International Workshop on Document Analysis Systems
Optical Character Recognition Techniques for Restoration of Thai Historical Documents
ICCEE '08 Proceedings of the 2008 International Conference on Computer and Electrical Engineering
Degraded Document Image Enhancement Using Hybrid Thresholding and Mathematical Morphology
ICVGIP '08 Proceedings of the 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing
Integrating Language Model in Handwritten Chinese Text Recognition
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Hi-index | 0.10 |
This paper presents a novel technique for recognizing broken characters found in degraded text documents by modeling it as a set-partitioning problem (SPP). The proposed technique searches for the optimal set-partition of the connected components by which each subset yields a reconstructed character. Given the non-linear nature of the objective function needed for optimal set-partitioning, we design an algorithm that we call Heuristic Incremental Integer Programming (HIIP). The algorithm employs integer programming (IP) with an incremental approach using heuristics to hasten the convergence. The objective function is formulated as probability functions that reflect common OCR measurements - pattern resemblance, sizing conformity and distance between connected components. We applied the HIIP technique to Thai and English degraded text documents and achieved accuracy rates over 90%. We also compared HIIP against three competing algorithms and achieved higher comparative accuracy in each case.