OCR of printed telugu text with high recognition accuracies

Authors:
C. Vasantha Lakshmi;Ritu Jain;C. Patvardhan
Affiliations:
Dayalbagh Educational Institute, Agra, India;Dayalbagh Educational Institute, Agra, India;Dayalbagh Educational Institute, Agra, India
Venue:
ICVGIP'06 Proceedings of the 5th Indian conference on Computer Vision, Graphics and Image Processing
Year:
2006

Citing 9
Cited 1

Character recognition—a review

Pattern Recognition
Twenty Years of Document Image Analysis in PAMI

IEEE Transactions on Pattern Analysis and Machine Intelligence
On How to Describe Shapes of Devanagari Characters and Use Them for Recognition

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
A Study of Representations for Pen Based Handwriting Recognition of Tamil Characters

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Script Line Separation from Indian Multi-Script Documents

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Segmentation of Bangla Handwritten Text into Characters by Recursive Contour Following

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Gujarati Character Recognition

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
An OCR System for Telugu

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
An optical character recognition system for printed Telugu text

Pattern Analysis & Applications

On performance analysis of end-to-end OCR systems of Indic scripts

Proceeding of the workshop on Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Telugu is one of the oldest and popular languages of India spoken by more than 66 million people especially in South India. Development of Optical Character Recognition systems for Telugu text is an area of current research. OCR of Indian scripts is much more complicated than the OCR of Roman script because of the use of huge number of combinations of characters and modifiers. Basic Symbols are identified as the unit of recognition in Telugu script. Edge Histograms are used for a feature based recognition scheme for these basic symbols. During recognition, it is observed that, in many cases, the recognizer incorrectly outputs a very similar looking symbol. Special logic and algorithms are developed using simple structural features for improving recognition accuracies considerably without too much additional computational effort. It is shown that recognition accuracies of 98.5 % can be achieved on laser quality prints with such a procedure.