Real-time scene text localization and recognition

Authors:
Lukas Neumann
Affiliations:
Centre for Machine Perception, Department of Cybernetics, Czech Technical University, Prague, Czech Republic
Venue:
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Year:
2012

Citing 0
Cited 6

Large-lexicon attribute-consistent text recognition in natural images

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
A real-time scene text to speech system

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Accurate and robust text detection: a step-in for text retrieval in natural scene images

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Con-text: text detection using background connectivity for fine-grained object classification

Proceedings of the 21st ACM international conference on Multimedia
Multilingual OCR research and applications: an overview

Proceedings of the 4th International Workshop on Multilingual OCR
Text extraction from natural scene image: A survey

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

An end-to-end real-time scene text localization and recognition method is presented. The real-time performance is achieved by posing the character detection problem as an efficient sequential selection from the set of Extremal Regions (ERs). The ER detector is robust to blur, illumination, color and texture variation and handles low-contrast text. In the first classification stage, the probability of each ER being a character is estimated using novel features calculated with O(1) complexity per region tested. Only ERs with locally maximal probability are selected for the second stage, where the classification is improved using more computationally expensive features. A highly efficient exhaustive search with feedback loops is then applied to group ERs into words and to select the most probable character segmentation. Finally, text is recognized in an OCR stage trained using synthetic fonts. The method was evaluated on two public datasets. On the ICDAR 2011 dataset, the method achieves state-of-the-art text localization results amongst published methods and it is the first one to report results for end-to-end text recognition. On the more challenging Street View Text dataset, the method achieves state-of-the-art recall. The robustness of the proposed method against noise and low contrast of characters is demonstrated by “false positives” caused by detected watermark text in the dataset.