Image thresholding for optical character recognition and other applications requiring character image extraction

  • Authors:
  • J. M. White;G. D. Rohrer

  • Affiliations:
  • IBM Information Products Division, Charlotte, North Carolina;IBM Information Products Division, Charlotte, North Carolina

  • Venue:
  • IBM Journal of Research and Development
  • Year:
  • 1983

Quantified Score

Hi-index 0.00

Visualization

Abstract

Two new, cost-effective thresholding algorithms for use in extracting binary images of characters from machine- or hand-printed documents are described. The creation of a binary representation from an analog image requires such algorithms to determine whether a point is converted into a binary one because it falls within a character stroke or a binary zero because it does not. This thresholding is a critical step in Optical Character Recognition (OCR). It is also essential for other Character Image Extraction (CIE) applications, such as the processing of machine-printed or handwritten characters from carbon copy forms or bank checks, where smudges and scenic backgrounds, for example, may have to be suppressed. The first algorithm, a nonlinear, adaptive procedure, is implemented with a minimum of hardware and is intended for many CIE applications. The second is a more aggressive approach directed toward specialized, high-volume applications which justify extra complexity.