AIDAS: Incremental Logical Structure Discovery in PDF Documents
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Extracting mathematical expressions from postscript documents
ISSAC '04 Proceedings of the 2004 international symposium on Symbolic and algebraic computation
Hi-index | 0.00 |
We present an approach to extracting mathematical formulae directly from PDF documents. We exploit both the perfect character information as well as additional font and spacing information available from a PDF document to ensure a faithful recognition of mathematical expressions. The extracted information can be post-processed to produce suitable markup that can be re-inserted into the PDF documents in order to enable the handling of mathematical formulae by accessibility technology. Furthermore, we demonstrate how we recognise different types of mathematical objects, such as relations, operators, etc., without reference to predefined knowledge or dictionary lookup, using character clustering and interspace and character font information alone, all of which contributes to our goal of reconstructing the intended semantics of a formula from its presentation.