Gabor Filter Based Multi-class Classifier for Scanned Document Images
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Text Degradations and OCR Training
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Hi-index | 0.00 |
In this paper we present a Multi-font OCR system to be employed for document processing, which performs, at the same time, both the character recognition and the font-style detection of the digits belonging to a subset of the existing fonts. The detection of the font-style of the document words can guide a rough automatic classification of documents, and can also be used to improve the character recognition.The system uses the tangent distance as a classification function in a nearest neighbor approach. We have to discriminate among different digits and, for the same character, we have to discriminate among different font-styles. The nearest neighbor approach is always able to recognize the digit, but the performance in font detection is not optimal. To improve the performance of the system, we have used a discriminant model, the TD-Neuron, which is employed to discriminate between two similar classes. Some experimental results and prospective use in document processing applications are presented.