Extracting Text from WWW Images

  • Authors:
  • Jiangying Zhou;Daniel P. Lopresti

  • Affiliations:
  • -;-

  • Venue:
  • ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
  • Year:
  • 1997

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper, we examine the problem of locating and extracting text from in-line images of World Wide Web pages. We described a text detection algorithm which is based on color clustering and connected component analysis. The algorithm first quantizes the color space of the input image into a number of color classes using a parameter-free clustering procedure. It then identifies text-like connected components in each color class based on their shapes. Finally, a post-processing procedure aligns text-like components into textlines. The experimental results show that our text extraction algorithm works well on a variety of test images.