A post-processing scheme for malayalam using statistical sub-character language models
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Hi-index | 0.00 |
In this paper, we empirically study the performance of a set of pattern classification schemes for character classification problems. We argue that with a rich feature space, this class of problems can be solved with reasonable success using a set of statistical feature extraction schemes. Experimental validation is done on a data set (of more than 500000 characters) collected and annotated from books printed primarily in Malayalam. Scope of this study include (a) applicability of a spectrum of classifiers and features (b) scalability of classifiers (c) sensitivity of features to degradation (d) generalization across fonts and (e) applicability across scripts.