Improving OCR accuracy for classical critical editions

  • Authors:
  • Federico Boschetti;Matteo Romanello;Alison Babeu;David Bamman;Gregory Crane

  • Affiliations:
  • Tufts University, Perseus Digital Library, Medford, MA;Tufts University, Perseus Digital Library, Medford, MA;Tufts University, Perseus Digital Library, Medford, MA;Tufts University, Perseus Digital Library, Medford, MA;Tufts University, Perseus Digital Library, Medford, MA

  • Venue:
  • ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a work-flow designed to populate a digital library of ancient Greek critical editions with highly accurate OCR scanned text. While the most recently available OCR engines are now able after suitable training to deal with the polytonic Greek fonts used in 19th and 20th century editions, further improvements can also be achieved with postprocessing. In particular, the progressive multiple alignment method applied to different OCR outputs based on the same images is discussed in this paper.