End-to-end scene text recognition

Authors:
Kai Wang;Boris Babenko;Serge Belongie
Affiliations:
Department of Computer Science and Engineering, University of California, San Diego, USA;Department of Computer Science and Engineering, University of California, San Diego, USA;Department of Computer Science and Engineering, University of California, San Diego, USA
Venue:
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Year:
2011

Citing 0
Cited 9

Synthesizing queries for handwritten word image retrieval

Pattern Recognition
Large-lexicon attribute-consistent text recognition in natural images

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Text extraction from scene images by character appearance and structure modeling

Computer Vision and Image Understanding
MAPS: midline analysis and propagation of segmentation

Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing
Text extraction from natural scene image: A survey

Neurocomputing
Fast perspective recovery of text in natural scenes

Image and Vision Computing
Reading the legends of Roman Republican coins

Journal on Computing and Cultural Heritage (JOCCH)
Transform invariant text extraction

The Visual Computer: International Journal of Computer Graphics
Detection and recognition of text superimposed in images base on layered method

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper focuses on the problem of word detection and recognition in natural images. The problem is significantly more challenging than reading text in scanned documents, and has only recently gained attention from the computer vision community. Sub-components of the problem, such as text detection and cropped image word recognition, have been studied in isolation [7, 4, 20]. However, what is unclear is how these recent approaches contribute to solving the end-to-end problem of word recognition. We fill this gap by constructing and evaluating two systems. The first, representing the de facto state-of-the-art, is a two stage pipeline consisting of text detection followed by a leading OCR engine. The second is a system rooted in generic object recognition, an extension of our previous work in [20]. We show that the latter approach achieves superior performance. While scene text recognition has generally been treated with highly domain-specific methods, our results demonstrate the suitability of applying generic computer vision methods. Adopting this approach opens the door for real world scene text recognition to benefit from the rapid advances that have been taking place in object recognition.