A Bilingual OCR for Hindi-Telugu Documents and its Applications

Authors:
C. V. Jawahar;M. N. S. S. K. Pavan Kumar;S. S. Ravi Kiran
Affiliations:
-;-;-
Venue:
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Year:
2003

Citing 6
Cited 8

Twenty Years of Document Image Analysis in PAMI

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Statistical, Nonparametric Methodology for Document Degradation Model Validation

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Theory of Learning and Generalization: With Applications to Neural Networks and Control Systems

A Theory of Learning and Generalization: With Applications to Neural Networks and Control Systems
An OCR System to Read Two Indian Language Scripts: Bangla and Devnagari (Hindi)

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Gujarati Character Recognition

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
An OCR System for Telugu

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition

Recognition of Printed Amharic Documents

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Optical character recognition for printed Hindi text in Devnagari using soft-computing technique

AIAP'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications
Multilingual OCR system for South Indian scripts and English documents: An approach based on Fourier transform and principal component analysis

Engineering Applications of Artificial Intelligence
Nearest neighbor based collection OCR

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
A semi-automatic adaptive OCR for digital libraries

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Performance analysis of feature extractors and classifiers for script recognition of English and Gurmukhi words

Proceeding of the workshop on Document Analysis and Recognition
On performance analysis of end-to-end OCR systems of Indic scripts

Proceeding of the workshop on Document Analysis and Recognition
Development of comprehensive devnagari numeral and character database for offline handwritten character recognition

Applied Computational Intelligence and Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the character recognition processfrom printed documents containing Hindi and Telugu text.Hindi and Telugu are among the most popular languages inIndia. The bilingual recognizer is based on Principal ComponentAnalysis followed by support vector classification.This attains an overall accuracy of approximately 96.7%.Extensive experimentation is carried out on an independenttest set of approximately 200000 characters. Applicationsbased on this OCR are sketched.