A filter based post-OCR accuracy boost system

  • Authors:
  • Eugene Borovikov;Ilya Zavorin;Mark Turner

  • Affiliations:
  • CACI International Inc., Lanham, MD;CACI International Inc., Lanham, MD;CACI International Inc., Lanham, MD

  • Venue:
  • Proceedings of the 1st ACM workshop on Hardcopy document processing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Our current research effort aims at building a filter based post-OCR accuracy boost system that will combine different post-OCR correction filters to improve the OCR accuracy better than each individual filter can. In this paper we focus on a Hidden Markov Model (HMM) based accuracy booster modeling OCR engine noise generation as a two-layer stochastic process. We employ a commercial spell-checker both as another error correction filter and as a base line for accuracy boost comparison. We demonstrate the versatility of our approach in experiments with documents in English and Arabic.