Correcting broken characters in the recognition of historical printed documents

  • Authors:
  • Michael Droettboom

  • Affiliations:
  • Johns Hopkins University, Baltimore, MD

  • Venue:
  • Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a new technique for dealing with broken characters, one of the major challenges in the optical character recognition (OCR) of degraded historical printed documents. A technique based on graph combinatorics is used to rejoin the appropriate connected components. It has been applied to real data with successful results.