An Omnifont Open-Vocabulary OCR System for English and Arabic
IEEE Transactions on Pattern Analysis and Machine Intelligence
Multilingual machine printed OCR
Hidden Markov models
A Generic Approach for the Vietnamese Handwritten and Speech Recognition Problems
IEA/AIE '02 Proceedings of the 15th international conference on Industrial and engineering applications of artificial intelligence and expert systems: developments in applied artificial intelligence
SACH'06 Proceedings of the 2006 conference on Arabic and Chinese handwriting recognition
Hi-index | 0.00 |
Rumours of the death of the problem of machine-printed text recognition have been greatly exaggerated. Reported results can be good enough to lead one to believe that this is a "solved problem". Closer analysis reveals test data that is often limited in its range of fonts and point sizes. Worse still, results are commonly quoted for noise-free images, ignoring the problems of recognising "real" documents such as faxes. Various methods have been proposed for modelling characters with Hidden Markov Models. The authors, amongst others, have suggested representing a character by analysing the pixel pattern in columns of its image, and linking sequential column patterns together with a HMM. In this paper we propose a method of quantising the patterns by means of a Shift Invariant Hamming Distance. A full experimental evaluation (45 fonts, 5 point sizes) in typical noise results in a recognition accuracy of 99% in the top-3 choices, and 94% top-choice for the best font. The method has a significant advantage in recognising noisy word images, due to classification being achieved without a prior segmentation of the word into characters.