Extracting Text from WWW Images

Authors:
Jiangying Zhou;Daniel P. Lopresti
Affiliations:
-;-
Venue:
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Year:
1997

Citing 0
Cited 12

Locating and Recognizing Text in WWW Images

Information Retrieval
Fuzzy Segmentation of Characters in Web Images Based on Human Colour Perception

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
The Effects of Image Enhancement in OCR Systems: A Prototype

ITCC '00 Proceedings of the The International Conference on Information Technology: Coding and Computing (ITCC'00)
Progress in Camera-Based Document Image Analysis

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Two Approaches for Text Segmentation in Web Images

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Image classification for mobile web browsing

Proceedings of the 15th international conference on World Wide Web
Initialization enhancer for non-negative matrix factorization

Engineering Applications of Artificial Intelligence
Image streaming and recognition for vehicle location tracking using mobile devices

GPC'07 Proceedings of the 2nd international conference on Advances in grid and pervasive computing
A skeleton-based method for multi-oriented video text detection

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Implementation of embedded system for intelligent image recognition and processing

ICCSA'06 Proceedings of the 6th international conference on Computational Science and Its Applications - Volume Part I
A practical license plate recognition system for real-time environments

IWANN'05 Proceedings of the 8th international conference on Artificial Neural Networks: computational Intelligence and Bioinspired Systems
NPIC: hierarchical synthetic image classification using image search and generic features

CIVR'06 Proceedings of the 5th international conference on Image and Video Retrieval

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we examine the problem of locating and extracting text from in-line images of World Wide Web pages. We described a text detection algorithm which is based on color clustering and connected component analysis. The algorithm first quantizes the color space of the input image into a number of color classes using a parameter-free clustering procedure. It then identifies text-like connected components in each color class based on their shapes. Finally, a post-processing procedure aligns text-like components into textlines. The experimental results show that our text extraction algorithm works well on a variety of test images.