Multilingual OCR system for South Indian scripts and English documents: An approach based on Fourier transform and principal component analysis

Authors:
V. N. Manjunath Aradhya;G. Hemantha Kumar;S. Noushath
Affiliations:
Department of Studies in Computer Science, University of Mysore, Mysore 570006, Karnataka, India;Department of Studies in Computer Science, University of Mysore, Mysore 570006, Karnataka, India;Department of Studies in Computer Science, University of Mysore, Mysore 570006, Karnataka, India
Venue:
Engineering Applications of Artificial Intelligence
Year:
2008

Citing 9
Cited 3

On the Recognition of Printed Characters of Any Font and Size

IEEE Transactions on Pattern Analysis and Machine Intelligence
Off-Line Cursive Script Word Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Character recognition—a review

Pattern Recognition
An OCR System to Read Two Indian Language Scripts: Bangla and Devnagari (Hindi)

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
An OCR System for Telugu

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
A Bilingual OCR for Hindi-Telugu Documents and its Applications

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Recognition of Printed Urdu Script

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Eigenfaces for recognition

Journal of Cognitive Neuroscience
Document analysis system

IBM Journal of Research and Development

Gujarati handwritten numeral optical character reorganization through neural network

Pattern Recognition
Multi-oriented Bangla and Devnagari text recognition

Pattern Recognition
Beyond cross-domain learning: Multiple-domain nonnegative matrix factorization

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Character recognition lies at the core of the discipline of pattern recognition where the aim is to represent a sequence of characters taken from an alphabet [Kasturi, R., Gorman, L.O., Govindaraju, V., 2002. Document image analysis: a primer. Sadhana 27 (Part 1), 3-22]. Though many kinds of features have been developed and their test performances on standard database have been reported, there is still room to improve the recognition rate by developing improved features. In this paper, we present a multilingual character recognition system for printed South Indian scripts (Kannada, Telugu, Tamil and Malayalam) and English documents. South Indian languages are most popular languages in India and around the world. The proposed multilingual character recognition is based on Fourier transform and principal component analysis (PCA), which are two commonly used techniques of image processing and recognition. PCA and Fourier transforms are classical feature extraction and data representation techniques widely used in the area of pattern recognition and computer vision. Our experimental results show the good performance over the data sets considered.