Ocropodium: open source OCR for small-scale historical archives

  • Authors:
  • Tobias Blanke;Michael Bryant;Mark Hedges

  • Affiliations:
  • King's College London, UK;King's College London, UK;King's College London, UK

  • Venue:
  • Journal of Information Science
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large-scale digitization projects dealing with text-based historical material face challenges that are not well catered for by commercial software. This article discusses the results of a project to build a scalable OCR workflow for historical collections based on open source tools that is particularly tailored towards use in small-scale historical archives. It argues that open source tools allow for better customization to match these requirements, particularly with regard to character model training and per-project language modelling. We offer insights into our accuracy evaluation results of various open source OCR tools, as well as a case study about the challenges and opportunities of open source OCR in historical archives.