Document image binarization using background estimation and stroke edges

  • Authors:
  • Shijian Lu;Bolan Su;Chew Lim Tan

  • Affiliations:
  • Institute for Infocomm Research, Department of Computer Visoin and Image Understanding, 1 Fusionopolis Way, #21-01 Connexis, 138632, Singapore, Singapore;National University of Singapore, Department of Computer Science, School of Computing, Computing 1, 13 Computing Drive, 117417, Singapore, Singapore;National University of Singapore, Department of Computer Science, School of Computing, Computing 1, 13 Computing Drive, 117417, Singapore, Singapore

  • Venue:
  • International Journal on Document Analysis and Recognition
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document images often suffer from different types of degradation that renders the document image binarization a challenging task. This paper presents a document image binarization technique that segments the text from badly degraded document images accurately. The proposed technique is based on the observations that the text documents usually have a document background of the uniform color and texture and the document text within it has a different intensity level compared with the surrounding document background. Given a document image, the proposed technique first estimates a document background surface through an iterative polynomial smoothing procedure. Different types of document degradation are then compensated by using the estimated document background surface. The text stroke edge is further detected from the compensated document image by using L1-norm image gradient. Finally, the document text is segmented by a local threshold that is estimated based on the detected text stroke edges. The proposed technique was submitted to the recent document image binarization contest (DIBCO) held under the framework of ICDAR 2009 and has achieved the top performance among 43 algorithms that are submitted from 35 international research groups.