A text image enhancement system based on segmentation and classification methods

  • Authors:
  • Yaguang Yang;Kristen Summers;Mark Turner

  • Affiliations:
  • CACI International Inc., Lanham, MD;CACI International Inc., Lanham, MD;CACI International Inc., Lanham, MD

  • Venue:
  • Proceedings of the 1st ACM workshop on Hardcopy document processing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes document processing techniques used in ImageRefiner, the automatic image enhancement system developed by CACI International Inc. Though other methods are used in the system, we discuss two techniques that are novel and well tested or particularly important in the system. The first is a novel segmentation method that segments the text image file into "homogeneous" segments. The second is the use of a neural network to select the best transformation for each segment. Our experiments show that after applying the transformation selected by the neural network method to each specific segment, the fully processed images usually have more accurate OCR output. On average, the OCR accuracy for processed images is 35% better than the original images for a test set of Arabic files.