Shape Encoded Post Processing of Gurmukhi OCR

  • Authors:
  • Dharam Veer Sharma;Gurpreet Singh Lehal;Sarita Mehta

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A post-processor is an integral part of any OCR system. This paper proposes a method for detection and correction of errors in recognition results of handwritten and machine printed Gurmukhi OCR. Based on the shape similarity of characters, the consonants of Gurmukhi Script are divided into different sets. Each set is given a unique number. In case of a recognition error, based on the shape of the consonants, corrections are made by taking each consonant of the subset into consideration. According to proposed algorithm, each recognized word is first encoded based on its consonants. The corresponding code is then searched in the dictionary. If it exits then words from the list of the code are match with the source word. In case of match the word is treated as correct else suggestions are made based on the similarity of the source word with the words of the same code present in dictionary. The method has been tested on the output of OCR of variety of machine printed and handwritten documents.