Lexicon and hidden Markov model-based optimisation of the recognised Sinhala script

Authors:
H. L. Premaratne;E. Järpe;J. Bigun
Affiliations:
School of Information Science, Computer and Electrical Engineering, Halmstad University, Box 823, S-301 18, Sweden and University of Colombo School of Computing, 35 Reid Avenue, Colombo 07, Sri La ...;School of Information Science, Computer and Electrical Engineering, Halmstad University, Box 823, S-301 18, Sweden;School of Information Science, Computer and Electrical Engineering, Halmstad University, Box 823, S-301 18, Sweden
Venue:
Pattern Recognition Letters
Year:
2006

Citing 6
Cited 2

A spelling correction method and its application to an OCR system

Pattern Recognition
On optimal order in modeling sequence of letters in words of common language as a Markov chain

Pattern Recognition
An HMM-Based Approach for Off-Line Unconstrained Handwritten Word Modeling and Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
One-dimensional representation of two-dimensional information for HMM based handwriting recognition

Pattern Recognition Letters
Recognising handwritten Arabic manuscripts using a single hidden Markov model

Pattern Recognition Letters
Off-line isolated handwritten Thai OCR using island-based projection with n-gram model and hidden Markov models

Information Processing and Management: an International Journal - Special issue: An Asian digital libraries perspective

Classification of internal carotid artery Doppler signals using fuzzy discrete hidden Markov model

Expert Systems with Applications: An International Journal
Biomedical system based on the Discrete Hidden Markov Model using the Rocchio-Genetic approach for the classification of internal carotid artery Doppler signals

Computer Methods and Programs in Biomedicine

Quantified Score

Hi-index	0.10

Visualization

Abstract

The Brahmi descended Sinhala script is used by 75% of the 18 million population in Sri Lanka. To the best of our knowledge, none of the Brahmi descended scripts used by hundreds of millions of people in South Asia, possess commercial OCR products. In the process of implementation of an OCR system for the printed Sinhala script which is easily adoptable to similar scripts [Premaratne, L., Assabie, Y., Bigun, J., 2004. Recognition of modification-based scripts using direction tensors. In: 4th Indian Conf. on Computer Vision, Graphics and Image Processing (ICVGIP2004), pp. 587-592]; a segmentation-free recognition method using orientation features has been proposed in [Premaratne, H.L., Bigun, J., 2004. A segmentation-free approach to recognise printed Sinhala script using linear symmetry. Pattern Recognition 37, 2081-2089]. Due to the limitations in image analysis techniques the character level accuracy of the results directly produced by the proposed character recognition algorithm saturates at 94%. The false rejections from the recognition algorithm are initially identified only as 'missing character positions' or 'blank characters'. It is necessary to identify suitable substitutes for such 'missing character positions' and optimise the accuracy of words to an acceptable level. This paper proposes a novel method that explores the lexicon in association with the hidden Markov models to improve the rate of accuracy of the recognised script. The proposed method could easily be extended with minor changes to other modification-based scripts consisting of confusing characters. The word-level accuracy which was at 81.5% is improved to 88.5% by the proposed optimisation algorithm.