Performance analysis of feature extractors and classifiers for script recognition of English and Gurmukhi words

Authors:
Rajneesh Rani;Renu Dhir;Gurpreet Singh Lehal
Affiliations:
NIT Jalandhar, Punjab, India;NIT Jalandhar, Punjab, India;Punjabi University, Patiala, Punjab, India
Venue:
Proceeding of the workshop on Document Analysis and Recognition
Year:
2012

Citing 12
Cited 1

Determination of the Script and Language Content of Document Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Script Identification From Document Images Using Cluster-Based Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence
Rotation Invariant Texture Features and Their Use in Automatic Script Identification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Trainable Script Identification Strategies for Indian Languages

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Script Line Separation from Indian Multi-Script Documents

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Language identification for printed text independent of segmentation

ICIP '95 Proceedings of the 1995 International Conference on Image Processing (Vol. 3)-Volume 3 - Volume 3
A Bilingual OCR for Hindi-Telugu Documents and its Applications

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Multi-Script Line identification from Indian Documents

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Word level multi-script identification

Pattern Recognition Letters
Script Recognition—A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
Script identification from indian documents

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems

A bilingual Gurmukhi-English OCR based on multiple script identifiers and language models

Proceedings of the 4th International Workshop on Multilingual OCR

Quantified Score

Hi-index	0.00

Visualization

Abstract

Script Recognition is a challenging field for the recognition of documents in a multilingual country like India where different scripts are in use. For optical character recognition of such multilingual documents, it is necessary to separate blocks, lines, words and characters of different scripts before feeding them to the OCRs of individual scripts. Many approaches have been proposed by the researchers towards script recognition at different levels (Block, Line, Word and Character Level). Normally Indian documents, in any its state language contain English words mixed with other words in its own state language. In this paper, we extract three different types of features: Structural, Gabor and Discrete Cosine Transforms(DCT) Features from Isolated English and Gurmukhi words and compare their script recognition performance using three different classifiers: Support Vector Machine (SVM), k-Nearest Neighbor and Parzen Probabilistic Neural Network (PNN).