Digit extraction and recognition from machine printed Gurmukhi documents

  • Authors:
  • Dharam Veer Sharma;Gurpreet Singh Lehal;Preety Kathuria

  • Affiliations:
  • Punjabi University, Patiala, India;Punjabi University, Patiala, India;Punjabi University, Patiala, India

  • Venue:
  • Proceedings of the International Workshop on Multilingual OCR
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

The work presented in this paper focuses on the problem of extraction and recognition of digits (Roman as well as Gurmukhi) from Machine Printed Gurmukhi documents. The whole process consists of three stages. The first, segmentation stage takes as input an image of a document and separates the different logical parts, like lines of paragraph, words of a line and characters of a word. Then probable set of digits is extracted based on their features which makes them different from other Gurmukhi text. The next, Feature Extraction stage analyzes the set of probable digits and selects a set of structural and statistical features that can be used to uniquely identify the digits. The selection of a stable and representative set of features is the heart of digit recognition system. The final, classification stage is the main decision making stage of the system and uses the features extracted in the previous stage to identify the digit. We have used non parametric statistical classifier i.e. K-Nearest Neighbour for recognition purposes. The most promising recognition accuracy is achieved by using DDD features which is 95% for roman digits and 92.6% for Gurmukhi digits.