Robust Text Line, Word And Character Extraction from Telugu Document Image

Authors:
Vijaya Kumar Koppula;Negi Atul;Utpal Garain
Affiliations:
-;-;-
Venue:
ICETET '09 Proceedings of the 2009 Second International Conference on Emerging Trends in Engineering & Technology
Year:
2009

Citing 0
Cited 1

A syntactic PR approach to Telugu handwritten character recognition

Proceeding of the workshop on Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Designing an OCR system for Indian languages in general is more complex than those of European languages due the linguistic complexity. Efforts are on the way for the development of efficient OCR systems for Indian languages, especially for Telugu, a popular South Indian language. In this paper, we proposed a method for reliable extraction of text line, word and character from document images of Telugu scripts. In the text line segmentation, first we establish the relationship between the connected components and then cluster the connected components of a line using vertical spatial relation and nearest neighbor algorithm. In word segmentation, the space between two adjacent characters is computed and clustered into word space and character space. Consonant and vowel modifiers are segregated from the word image and segment the characters.