Document image binarization by two-stage block extraction and background intensity determination

  • Authors:
  • Yi-Hong Tseng;Hsi-Jian Lee

  • Affiliations:
  • Da Yeh University, Department of Information Management, Changhua, Taiwan, ROC;Tzu Chi University, Department of Medical Informatics, Hualien, Taiwan, ROC

  • Venue:
  • Pattern Analysis & Applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper presents a novel approach to binarizing document images. All blocks with individual background intensity values in a document image are first extracted using a two-stage extraction procedure. Then, the intensity distribution of each block is calculated to determine the variation ranges of background intensity. For each extracted block, interior pixels whose intensity values fall within these ranges are regarded as background pixels. For those pixels outside all extracted blocks, Otsu’s global threshold method is applied to binarize them. To evaluate the developed system, 275 representative document images are collected to evaluate the binarization results by recognizing characters extracted from those binarized images. These binarized images are generated using the proposed and other existent approaches and fed into the same optical character recognition system to evaluate the practicability of each method. The proposed document binarization method obtains the highest recognition accuracy.