Quantitative analysis of mathematical documents

  • Authors:
  • S. Uchida;A. Nomura;M. Suzuki

  • Affiliations:
  • Department of Intelligent Systems, Kyushu University, 6-10-1, Hakozaki, Higashi-ku, Fukuoka-shi, Japan;Department of Mathematics, Kyushu University, 6-10-1, Hakozaki, Higashi-ku, Fukuoka-shi, Japan;Department of Mathematics, Kyushu University, 6-10-1, Hakozaki, Higashi-ku, Fukuoka-shi, Japan

  • Venue:
  • International Journal on Document Analysis and Recognition
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mathematical documents are analyzed from several viewpoints for the development of practical OCR for mathematical and other scientific documents. Specifically, four viewpoints are quantified using a large-scale database of mathematical documents, containing 690,000 manually ground-truthed characters: (i) the number of character categories, (ii) abnormal characters (e.g., touching characters), (iii) character size variation, and (iv) the complexity of the mathematical expressions. The result of these analyses clarifies the difficulties of recognizing mathematical documents and then suggests several promising directions to overcome them.