Detecting and reading text in natural scenes

Authors:
Xiangrong Chen;Alan L. Yuille
Affiliations:
Departments of Statistics, University of California, Los Angeles, Los Angeles, CA;Departments of Statistics, University of California, Los Angeles, Los Angeles, CA and Departments of Psychology, University of California, Los Angeles, Los Angeles, CA
Venue:
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Year:
2004

Citing 16
Cited 57

A new method for image segmentation

Computer Vision, Graphics, and Image Processing
Using Generative Models for Handwritten Digit Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Finding text in images

DL '97 Proceedings of the second ACM international conference on Digital libraries
Neural Network-Based Face Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Image Segmentation by Data-Driven Markov Chain Monte Carlo

IEEE Transactions on Pattern Analysis and Machine Intelligence
Contour and Texture Analysis for Image Segmentation

International Journal of Computer Vision
Coarse-to-Fine Face Detection

International Journal of Computer Vision - Special issue on statistical and computational theories of vision: Part II
Video OCR for Digital News Archive

CAIVD '98 Proceedings of the 1998 International Workshop on Content-Based Access of Image and Video Databases (CAIVD '98)
Viewpoint-Invariant Learning and Detection of Human Heads

FG '00 Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000
Face Detection Using Mixtures of Linear Subspaces

FG '00 Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000
A statistical approach to 3d object detection applied to faces and cars

A statistical approach to 3d object detection applied to faces and cars
ICDAR 2003 Robust Reading Competitions

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Detecting Pedestrians Using Patterns of Motion and Appearance

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Extraction and recognition of artificial text in multimedia documents

Pattern Analysis & Applications
Eigenfaces for recognition

Journal of Cognitive Neuroscience
Automatic text detection and tracking in digital video

IEEE Transactions on Image Processing

Image Parsing: Unifying Segmentation, Detection, and Recognition

International Journal of Computer Vision
Text Locating Competition Results

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Multimodal fusion using learned text concepts for image categorization

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
THU-ICRC at rush summarization of TRECVID 2007

Proceedings of the international workshop on TRECVID video summarization
Text detection and restoration in natural scene images

Journal of Visual Communication and Image Representation
Color targets: fiducials to help visually impaired people find their way by camera phone

Journal on Image and Video Processing
Shape matching and registration by data-driven EM

Computer Vision and Image Understanding
Morphology-based text line extraction

Machine Vision and Applications
Spatial relationship representation for visual object searching

Neurocomputing
Automatic Collection of Fuel Prices from a Network of Mobile Cameras

DCOSS '08 Proceedings of the 4th IEEE international conference on Distributed Computing in Sensor Systems
Automatic text discovering through stroke-based segmentation and text string combination

MM '08 Proceedings of the 16th ACM international conference on Multimedia
A stroke filter and its application to text localization

Pattern Recognition Letters
Figure-ground segmentation using factor graphs

Image and Vision Computing
Accurate text localization in images based on SVM output scores

Image and Vision Computing
Impact of Gaze Analysis on the Design of a Caption Production Software

UAHCI '09 Proceedings of the 5th International Conference on Universal Access in Human-Computer Interaction. Part III: Applications and Services
A Classification Architecture Based on Connected Components for Text Detection in Unconstrained Environments

AVSS '09 Proceedings of the 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance
Text Detection in Urban Scenes

Proceedings of the 2009 conference on Artificial Intelligence Research and Development: Proceedings of the 12th International Conference of the Catalan Association for Artificial Intelligence
Text detection in natural scene images with feature combination

SIP '07 Proceedings of the Ninth IASTED International Conference on Signal and Image Processing
Text Detection in Urban Scenes

Proceedings of the 2009 conference on Artificial Intelligence Research and Development: Proceedings of the 12th International Conference of the Catalan Association for Artificial Intelligence
Object detection using spatial histogram features

Image and Vision Computing
Context information from search engines for document recognition

Pattern Recognition Letters
Grouping using factor graphs: an approach for finding text with a camera phone

GbRPR'07 Proceedings of the 6th IAPR-TC-15 international conference on Graph-based representations in pattern recognition
Color-based text extraction for the image

PCM'07 Proceedings of the multimedia 8th Pacific Rim conference on Advances in multimedia information processing
Learning-based license plate detection in vehicle image database

International Journal of Intelligent Information and Database Systems
A new wavelet-median-moment based method for multi-oriented video text detection

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Scene text detection suitable for parallelizing on multi-core

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
On the detection of textual information in metro stations

Proceedings of the 7th International Conference on Frontiers of Information Technology
Word spotting in the wild

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
A novel mutual nearest neighbor based symmetry for text frame classification in video

Pattern Recognition
Text localization and recognition in complex scenes using local features

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part III
A method for text localization and recognition in real-world images

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part III
Supporting blind photography

The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility
(Computer) vision without sight

Communications of the ACM
Real-Time license plate detection under various conditions

UIC'06 Proceedings of the Third international conference on Ubiquitous Intelligence and Computing
Text detection in images based on color texture features

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Localizing slab identification numbers in factory scene images

Expert Systems with Applications: An International Journal
A novel system for robust text location and recognition of book covers

ACCV'09 Proceedings of the 9th Asian conference on Computer Vision - Volume Part II
Multi-script and multi-oriented text localization from scene images

CBDAR'11 Proceedings of the 4th international conference on Camera-Based Document Analysis and Recognition
Assistive text reading from complex background for blind persons

CBDAR'11 Proceedings of the 4th international conference on Camera-Based Document Analysis and Recognition
Chinese text location under complex background using Gabor filter and SVM

Neurocomputing
Text extraction from videos using a hybrid approach

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Detecting text in the real world

Proceedings of the 20th ACM international conference on Multimedia
Scene text detection using graph model built upon maximally stable extremal regions

Pattern Recognition Letters
T-HOG: An effective gradient-based descriptor for single line text regions

Pattern Recognition
Large-lexicon attribute-consistent text recognition in natural images

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Graph-Based detection of objects with regular regions

ICIRA'12 Proceedings of the 5th international conference on Intelligent Robotics and Applications - Volume Part III
A real-time scene text to speech system

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Text extraction from scene images by character appearance and structure modeling

Computer Vision and Image Understanding
GAS meter reading from real world images using a multi-net system

Pattern Recognition Letters
A text reading algorithm for natural images

Image and Vision Computing
Semantic-Feature-Based Object Recognition by Using Internet Data Mining

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Robust real-time detection of multi-color markers on a cell phone

Journal of Real-Time Image Processing
Scale based region growing for scene text detection

Proceedings of the 21st ACM international conference on Multimedia
Learning to discriminate text from synthetic data

Robot Soccer World Cup XV
Transform invariant text extraction

The Visual Computer: International Journal of Computer Graphics
Detection and recognition of text superimposed in images base on layered method

Neurocomputing

Quantified Score

Hi-index	0.02

Visualization

Abstract

This paper gives an algorithm for detecting and reading text in natural images. The algorithm is intended for use by blind and visually impaired subjects walking through city scenes. We first obtain a dataset of city images taken by blind and normally sighted subjects. From this dataset, we manually label and extract the text regions. Next we perform statistical analysis of the text regions to determine which image features are reliable indicators of text and have low entropy (i.e. feature response is similar for all text images).We obtain weak classifiers by using joint probabilities for feature responses on and off text. These weak classifiers are used as input to an AdaBoost machine learning algorithm to train a strong classifier. In practice, we trained a cascade with 4 strong classifiers containg 79 features. An adaptive binarization and extension algorithm is applied to those regions selected by the cascade classifier. A commercial OCR software is used to read the text or reject it as a non-text region. The overall algorithm has a success rate of over 90% (evaluated by complete detection and reading of the text) on the test set and the unread text is typically small and distant from the viewer.