Restoration of Decorative Headline Images for Document Retrieval

  • Authors:
  • Tomio Amano

  • Affiliations:
  • -

  • Venue:
  • DAS '98 Selected Papers from the Third IAPR Workshop on Document Analysis Systems: Theory and Practice
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a method for restoring decorative character images in headlines of newspapers and magazines. Although headlines contain useful keywords for document retrieval, conventional OCRs cannot always recognize them because the characters are often printed in reverse and with various background textures. We made filters that generate multiple candidate images by changing a small number of simple parameters (namely, by setting a threshold for stroke-width filtering and reversing black and white), so that one of the candidates contains a "normal' image whose characters are printed in black on a white background. If all the candidate images are recognized and an index is created, the keywords in headlines are expected to be retrieved without manual keyword entry and verification processes. In an experiment that we conducted, about 90% of characters in headline images segmented from newspapers were restored in the sense that one of the restored candidate images contained correct character images.