Optical character recognition for printed Hindi text in Devnagari using soft-computing technique

Authors:
Divakar Yadav;A. K. Sharma;J. P. Gupta
Affiliations:
JIIT, Noida, India;YMCA, Faridabad, India;JIIT, Noida, India
Venue:
AIAP'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications
Year:
2007

Citing 7
Cited 2

Skew Angle Detection of Digitized Indian Script Documents

IEEE Transactions on Pattern Analysis and Machine Intelligence
An OCR System to Read Two Indian Language Scripts: Bangla and Devnagari (Hindi)

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
A Thinning Algorithm for Digital Figures of Characters

SSIAI '00 Proceedings of the 4th IEEE Southwest Symposium on Image Analysis and Interpretation
Partitioning and Searching Dictionary for Correction of Optically Read Devanagari Character Strings

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
A Complete OCR for Printed Hindi Text in Devanagari Script

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
A Bilingual OCR for Hindi-Telugu Documents and its Applications

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Segmentation of touching characters in printed Devnagari and Bangla scripts using fuzzy multifactorial analysis

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

Digit extraction and recognition from machine printed Gurmukhi documents

Proceedings of the International Workshop on Multilingual OCR
Devanagari character recognition towards natural human-computer interaction

IHCI'10 Proceedings of the 2010 international conference on Interaction Design & International Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present an OCR for printed Hindi text in devnagari script. Text written in Devnagari script, there is no separation between the characters. Hindi is one of the most spoken language in India. About 300 million people speak Hindi in India. One of the important reasons for poor recognition rate in optical character recognition (OCR) system is the error in character segmentation. Preprocessing task considered in this paper is conversion of gray scale images to binary images, image rectification, and segmentation of text into lines, words and basic symbols. Basic symbols are identified as the fundamental unit of segmentation in this paper which are recognized by neural classifier. We have used three feature extraction techniques namely, histogram of projection based on mean distance, histogram of projection based on pixel value and Vertical Zero crossing . These feature extraction techniques are very much powerful to extract feature of even distorted characters. A back-propagation neural network with two hidden layer is used to create a character recognition system. The system is trained and evaluated with printed text. A performance of approximately 90% correct recognition is achieved.