A Linear Grammar Approach to Mathematical Formula Recognition from PDF

Authors:
Josef B. Baker;Alan P. Sexton;Volker Sorge
Affiliations:
School of Computer Science, University of Birmingham,;School of Computer Science, University of Birmingham,;School of Computer Science, University of Birmingham,
Venue:
Calculemus '09/MKM '09 Proceedings of the 16th Symposium, 8th International Conference. Held as Part of CICM '09 on Intelligent Computer Mathematics
Year:
2009

Citing 6
Cited 1

A Recognition Method of Matrices by Using Variable Block Pattern Elements Generating Rectangular Area

GREC '01 Selected Papers from the Fourth International Workshop on Graphics Recognition Algorithms and Applications
AIDAS: Incremental Logical Structure Discovery in PDF Documents

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Detection of Matrices and Segmentation of Matrix Elements in Scanned Images of Scientific Documents

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
INFTY: an integrated OCR system for mathematical documents

Proceedings of the 2003 ACM symposium on Document engineering
Extracting mathematical expressions from postscript documents

ISSAC '04 Proceedings of the 2004 international symposium on Symbolic and algebraic computation
Database-driven mathematical character recognition

GREC'05 Proceedings of the 6th international conference on Graphics Recognition: ten Years Review and Future Perspectives

MaxTract: converting PDF to LATEX, MathML and text

CICM'12 Proceedings of the 11th international conference on Intelligent Computer Mathematics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many approaches have been proposed over the years for the recognition of mathematical formulae from scanned documents. More recently a need has arisen to recognise formulae from PDF documents. Here we can avoid ambiguities introduced by traditional OCR approaches and instead extract perfect knowledge of the characters used in formulae directly from the document. This can be exploited by formula recognition techniques to achieve correct results and high performance. In this paper we revisit an old grammatical approach to formula recognition, that of Anderson from 1968, and assess its applicability with respect to data extracted from PDF documents. We identify some problems of the original method when applied to common mathematical expressions and show how they can be overcome. The simplicity of the original method leads to a very efficient recognition technique that not only is very simple to implement but also yields results of high accuracy for the recognition of mathematical formulae from PDF documents.