A Weighted Finite-State Framework for Correcting Errors in Natural Scene OCR

Authors:
R. Beaufort;C. Mancas-Thillou
Affiliations:
Multitel Research Center Belgium;Faculte Polytechnique de Mons, Belgium
Venue:
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Year:
2007

Citing 0
Cited 4

A multifunctional reading assistant for the visually impaired

Journal on Image and Video Processing
A multifunctional reading assistant for the visually impaired

Journal on Image and Video Processing
Efficient OCR post-processing combining language, hypothesis and error models

SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition
Large-lexicon attribute-consistent text recognition in natural images

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the increasing market of cheap cameras, natural scene text has to be handled in an efficient way. Some works deal with text detection in the image while more recent ones point out the challenge of text extraction and recognition. We propose here an OCR correction system to handle tra- ditional issues of recognizer errors but also the ones due to natural scene images, i.e. cut characters, artistic display, uncomplete sentences (present in advertisements) and out- of-vocabulary (OOV) words such as acronyms and so on. The main algorithm bases on Finite-State Machines (FSMs) to deal with learned OCR confusions, capital/accented let- ters and lexicon look-up. Moreover, as OCR is not consid- ered as a black box, several outputs are taken into account to intermingle recognition and correction steps. Based on a public database of natural scene words, detailed results are also presented along with future works.