Image processing for historical newspaper archives

  • Authors:
  • Takahiro Shima;Kengo Terasawa;Toshio Kawashima

  • Affiliations:
  • Renesas Micro Systems Co., Ltd., Sapporo, Hokkaido, Japan;Future University Hakodate, Hakodate, Hokkaido, Japan;Future University Hakodate, Hakodate, Hokkaido, Japan

  • Venue:
  • Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents some image processing methods that could produce accurate character segmentation results for historical newspaper archives. A full text search using a word spotting technique is no doubt a promising approach in order to facilitate the utilization of digital archives. Some word spotting techniques require the target images to be segmented into character images in advance, however character segmentation is a difficult issue especially for old and degraded document images. This paper figures out the causes that make the character segmentation difficult, and removes them in order to improve the accuracy of character segmentation. We first detect the ruled lines using Hough Transform in order to segment a whole newspaper image into column-separated images. Then we remove the ruled lines as well as ruby characters and noise. The proposed system is tested for 20 column-separated images of historical newspapers, and the accuracy of character segmentation is improved to 96.3%.