An OCR System for Telugu

Authors:
Affiliations:
Venue:
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Year:
2001

Citing 0
Cited 10

A Bilingual OCR for Hindi-Telugu Documents and its Applications

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Recognition of Printed Urdu Script

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Localization, Extraction and Recognition of Text in Telugu Document Images

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Multilingual OCR system for South Indian scripts and English documents: An approach based on Fourier transform and principal component analysis

Engineering Applications of Artificial Intelligence
Digit extraction and recognition from machine printed Gurmukhi documents

Proceedings of the International Workshop on Multilingual OCR
Handwritten character recognition of popular south Indian scripts

SACH'06 Proceedings of the 2006 conference on Arabic and Chinese handwriting recognition
Nearest neighbor based collection OCR

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
OCR of printed telugu text with high recognition accuracies

ICVGIP'06 Proceedings of the 5th Indian conference on Computer Vision, Graphics and Image Processing
On performance analysis of end-to-end OCR systems of Indic scripts

Proceeding of the workshop on Document Analysis and Recognition
A syntactic PR approach to Telugu handwritten character recognition

Proceeding of the workshop on Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: Telugu is the language spoken by more than 100 million people of South India. Telugu has a complex orthography with a large number of distinct character shapes (estimated to be of the order of 10,000) composed of simple and com-pound characters formed from 16 vowels (called achchus) and 36 consonants (called hallus). Here we present an efficient and practical approach to Telugu OCR which limits the number of templates to be recognized to just 370, avoiding issues of classifier design for thousands of shapes or very complex glyph segmentation. A compositional approach using connected components and fringe distance template matching was tested to give a raw OCR accuracy of about 92%. Several experiments across varying fonts and resolutions showed the approach to be satisfactory.