Refinement of digitized documents through recognition of mathematical formulae

  • Authors:
  • Toshihiro KANAHORI;Masakazu SUZUKI

  • Affiliations:
  • Tsukuba University of Technology,Ibaraki, Japan;Kyushu University, Fukuoka, Japan

  • Venue:
  • DIAL '06 Proceedings of the Second International Conference on Document Image Analysis for Libraries
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We are developing a recognition system, named 'Infty', for scientific documents including those with mathematical formulae. In this paper, we propose a new system that can refine a text embedded PDF document recognizing the PDF as images and integrating its text information into the recognition results of Infty. This system can be combined with other OCR systems that output recognition results as text embedded in a PDF document. Using this system, mathematical information can be added to books, journals and papers in existing digital libraries. We evaluate effects of this system, comparing its recognition rates with those of ABBYY FineReader. The evaluation shows that this system can add mathematical information to PDF documents generated by FineReader without loss of quality of the ordinary text parts.