TextFinde: An Automatic System to Detect and Recognize Text in Images

  • Authors:
  • V. Wu;R. Manmatha;E. M. Riseman

  • Affiliations:
  • -;-;-

  • Venue:
  • TextFinde: An Automatic System to Detect and Recognize Text in Images
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

There are many applications in which the automatic detection and recognition of text embedded in images is useful. These applications include digital libraries, multimedia systems, information retrieval systems, and geographical information systems (GIS). When machine generated text is printed against clean backgrounds, it can be converted to a computer readable form (ASCII) using current optical character recognition (OCR) technology. However, text often is printed against shaded or textured backgrounds, or is embedded in images. Examples include maps, advertisements, photographs, videos and stock certificates. Current document segmentation and recognition technologies cannot handle these situations effectively. In this paper, a four-step system to automatically detect and extract text in images is proposed. First, a texture segmentation scheme is used to focus attention on regions where text may occur. Second, strokes are extracted from the segmented text regions. Using reasonable heuristics on text strings, such as height similarity, spacing and alignment, the extracted strokes are then processed to form rectangular boxes surrounding the corresponding text strings. To detect text over a wide range of font sizes, the above steps are first applied to a pyramid of images generated from the input image, and then the text boxes formed at each resolution level of the pyramid are fused within the image at the original resolution level. Third, text is extracted by cleaning up the background and binarizing the detected text strings, then, better bounding boxes are generated by using the binarized text as strokes. Finally, text is then cleaned and binarized from these new boxes. If the extracted text is of an OCR-recognizable font, it is passed through a commercial OCR engine for recognition. The system is stable, robust, and works well on images (with or without structured layouts) from a wide variety of sources, including digitized video frames, photographs, newspapers, advertisements, stock certificates, and personal checks. Color images are converted into gray scale images before the algorithm is carried out. All parameters remain the same for all the experiments presented. We also describe a methodology for automatically evaluating such systems and validate it with a manual evaluation technique.