A corpus for OCR research on mathematical expressions

  • Authors:
  • Utpal Garain;B. Chaudhuri

  • Affiliations:
  • Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203 B. T. Road, 700 035, Calcutta, India;Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203 B. T. Road, 700 035, Calcutta, India

  • Venue:
  • International Journal on Document Analysis and Recognition
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper is concerned with research on OCR (optical character recognition) of printed mathematical expressions. Construction of a representative corpus of technical and scientific documents containing expressions is discussed. A statistical investigation of the corpus is presented, and usefulness of this analysis is demonstrated in the related research problems, namely, (i) identification and segmentation of expression zones from the rest of the document, (ii) recognition of expression symbols, (iii) interpretation of expression structures, and (iv) performance evaluation of a mathematical expression recognition system. Moreover, a groundtruthing format has been proposed to facilitate automatic evaluation of expression recognition techniques.