Hybrid OCR combination for ancient documents

  • Authors:
  • Hubert Cecotti;Abdel Belaïd

  • Affiliations:
  • READ Group, LORIA/CNRS, Vandoeuvre-les-Nancy, France;READ Group, LORIA/CNRS, Vandoeuvre-les-Nancy, France

  • Venue:
  • ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Commercial Optical Character Recognition (OCR) have at lot improved in the last few years. Their outstanding ability to process different kinds of documents is their main quality. However, their generality can also be an issue, as they cannot recognize perfectly documents far from the average present-day documents. We propose in this paper a system combining several OCRs and a specialized ICR (Intelligent Character Recognition) based on a convolutional neural network to complement them. Instead of just performing several OCRs in parallel and applying a fusing rule on the results, a specialized neural network with an adaptive topology is added to complement the OCRs, in function of the OCRs errors. This system has been tested on ancient documents containing old characters and old fonts not used in contemporary documents. The OCRs combination increases the recognition of about 3% whereas the ICR improves the recognition of rejected characters of more than 5%.