A Ground-Truthed Mathematical Character and Symbol Image Database
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Mathematical symbol recognition with support vector machines
Pattern Recognition Letters
Mathematical knowledge browser with automatic hyperlink detection
MKM'05 Proceedings of the 4th international conference on Mathematical Knowledge Management
Hi-index | 0.00 |
Mathematical documents are analyzed from several viewpoints for the development of practical OCR for mathematical and other scientific documents. Specifically, four viewpoints are quantified using a large-scale database of mathematical documents, containing 690,000 manually ground-truthed characters: (i) the number of character categories, (ii) abnormal characters (e.g., touching characters), (iii) character size variation, and (iv) the complexity of the mathematical expressions. The result of these analyses clarifies the difficulties of recognizing mathematical documents and then suggests several promising directions to overcome them.