Digit extraction and recognition from machine printed Gurmukhi documents

Authors:
Dharam Veer Sharma;Gurpreet Singh Lehal;Preety Kathuria
Affiliations:
Punjabi University, Patiala, India;Punjabi University, Patiala, India;Punjabi University, Patiala, India
Venue:
Proceedings of the International Workshop on Multilingual OCR
Year:
2009

Citing 10
Cited 0

A Survey of Methods and Strategies in Character Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Feature for Character Recognition Based on Directional Distance Distributions

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
An OCR System to Read Two Indian Language Scripts: Bangla and Devnagari (Hindi)

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Statistical Approach to Feature Extraction for Numeral Recognition from Degraded Documents

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Automatic Recognition of Printed Oriya Script

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
An OCR System for Telugu

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Recognition of Printed Urdu Script

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
A New Scheme for Off-Line Handwritten Connected Digit Recognition

ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 2 - Volume 2
Optical character recognition for printed Hindi text in Devnagari using soft-computing technique

AIAP'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications
Feature extraction and classification for bilingual script (Gurmukhi and Roman)

ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology

Quantified Score

Hi-index	0.01

Visualization

Abstract

The work presented in this paper focuses on the problem of extraction and recognition of digits (Roman as well as Gurmukhi) from Machine Printed Gurmukhi documents. The whole process consists of three stages. The first, segmentation stage takes as input an image of a document and separates the different logical parts, like lines of paragraph, words of a line and characters of a word. Then probable set of digits is extracted based on their features which makes them different from other Gurmukhi text. The next, Feature Extraction stage analyzes the set of probable digits and selects a set of structural and statistical features that can be used to uniquely identify the digits. The selection of a stable and representative set of features is the heart of digit recognition system. The final, classification stage is the main decision making stage of the system and uses the features extracted in the previous stage to identify the digit. We have used non parametric statistical classifier i.e. K-Nearest Neighbour for recognition purposes. The most promising recognition accuracy is achieved by using DDD features which is 95% for roman digits and 92.6% for Gurmukhi digits.