Learning on the Fly: Font-Free Approaches to Difficult OCR Problems

Authors:
Andrew Kae;Erik Learned-Miller
Affiliations:
-;-
Venue:
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Year:
2009

Citing 0
Cited 4

Bounding the probability of error for high precision optical character recognition

The Journal of Machine Learning Research
Estimation and selection via absolute penalized convex minimization and its multistage adaptive applications

The Journal of Machine Learning Research
Why multiple document image binarizations improve OCR

Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
Multilingual OCR research and applications: an overview

Proceedings of the 4th International Workshop on Multilingual OCR

Quantified Score

Hi-index	0.00

Visualization

Abstract

Despite ubiquitous claims that optical character recognition (OCR) is a "solved problem,'' many categories of documents continue to break modern OCR software such as documents with moderate degradation or unusual fonts.Many approaches rely on pre-computed or stored character models, but these are vulnerable to cases when the font of a particular document was not part of the training set, or when there is so much noise in a document that the font model becomes weak.To address these difficult cases, we present a form of iterative contextual modeling that learns character models directly from the document it is trying to recognize.We use these learned models both to segment the characters and to recognize them in an incremental, iterative process. We present results comparable to those of a commercial OCR system on a subset of characters from a difficult test document.