Locating and Recognizing Text in WWW Images

Authors:
Daniel Lopresti;Jiangying Zhou
Affiliations:
Bell Laboratories, Lucent Technologies, Inc., 600 Mountain Avenue, Murray Hill, NJ 07974, USA. dpl@research.bell-labs.com;Summus Ltd., Suite 2200, 2000 Center Point Drive, Columbia, SC 29210, USA. jiangying@summus.com
Venue:
Information Retrieval
Year:
2000

Citing 14
Cited 12

Computational geometry: an introduction

Computational geometry: an introduction
A note on the trade-off between sampling and quantization in signal processing

Journal of Complexity
Estimation of Planar Curves, Surfaces, and Nonplanar Space Curves Defined by Implicit Equations with Applications to Edge and Range Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
N-Tuple Features for OCR Revisited

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic text recognition for video indexing

MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
Finding text in images

DL '97 Proceedings of the second ACM international conference on Digital libraries
Spatial Sampling of Printed Patterns

IEEE Transactions on Pattern Analysis and Machine Intelligence
Text enhancement in digital video using multiple frame integration

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Designing Web Graphics .2

Designing Web Graphics .2
Extracting Text from WWW Images

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Robust Retrieval of Noisy Text

ADL '96 Proceedings of the 3rd International Forum on Research and Technology Advances in Digital Libraries
Foreground/background segmentation of color images by integration of multiple cues

ICIP '95 Proceedings of the 1995 International Conference on Image Processing (Vol. 1)-Volume 1 - Volume 1
Computationally fast Bayesian recognition of complex objects based on mutual algebraic invariants

ICIP '95 Proceedings of the 1995 International Conference on Image Processing (Vol.2)-Volume 2 - Volume 2
Spatial sampling effects in optical character recognition

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1

Human Interactive Proofs and Document Image Analysis

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Exploiting WWW Resources in Experimental Document Analysis Research

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Fuzzy Segmentation of Characters in Web Images Based on Human Colour Perception

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Two Approaches for Text Segmentation in Web Images

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Identifying Story and Preview Images in News Web Pages

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Effective text extraction and recognition for WWW images

Proceedings of the 2003 ACM symposium on Document engineering
On Foreground-Background Separation in Low Quality Color Document Images

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Colour text segmentation in web images based on human perception

Image and Vision Computing
A language-independent, open-vocabulary system based on HMMs for recognition of ultra low resolution words

Proceedings of the 2008 ACM symposium on Applied computing
A HMM-based approach to recognize ultra low resolution anti-aliased words

PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
A framework for the assessment of text extraction algorithms on complex colour images

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
NEOCR: a configurable dataset for natural image text recognition

CBDAR'11 Proceedings of the 4th international conference on Camera-Based Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

The explosive growth of the World Wide Web has resulted in a distributed database consisting of hundreds of millions of documents. While existing search engines index a page based on the text that is readily extracted from its HTML encoding, an increasing amount of the information on the Web is embedded in images. This situation presents a new and exciting challenge for the fields of document analysis and information retrieval, as WWW image text is typically rendered in color and at very low spatial resolutions. In this paper, we survey the results of several years of our work in the area. For the problem of locating text in Web images, we describe a procedure based on clustering in color space followed by a connected-components analysis that seems promising. For character recognition, we discuss techniques using polynomial surface fitting and “fuzzy” n-tuple classifiers. Also presented are the results of several experiments that demonstrate where our methods perform well and where more work needs to be done. We conclude with a discussion of topics for further research.