SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Hi-index | 0.00 |
Decoding noisy document images is commonly needed in applications such as enterprise content management. Available OCR solutions are still not satisfactory especially on noisy images, and re-trainable systems require difficult and tedious training example preparation. Motivated by this challenging real application, we propose a novel so- lution that organically combines generative template mod- els with discriminative classifiers via RBF Fisher kernel de- rived from a generative model. We show that the new ap- proach is highly accurate in decoding noisy document im- ages, making the system more generalizable to variations in font and degradation, and hence significantly reduces the burden in training example preparation. We also show that as it weights the pixel features by their relevancies, RBF Fisher kernel is more robust, and leads to smaller, faster models by dimensionality reduction.