Preprocessing and Structural Features for a Multi-Fonts Arabic/Persian OCR

  • Authors:
  • Madana Kavianifar;Adnan Amin

  • Affiliations:
  • -;-

  • Venue:
  • ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

English and Chinese are languages, which have tremendously attracted interests of character recognition researchers. In contrast, research in the field of character recognition for Arabic scripts face major problems which is mainly related to the unique characteristics of these two like being cursive, multiple shapes of one character in different positions in a word and connectivity of characters on the baseline. The proposed work consists of three major phases. After digitizing the text, the original image is transformed in to a gray scale image using a 300-dpi scanner. Different steps of preprocessing such as noise reduction, global thresholding, skewing, recognizing connected components and grouping them are then applied on the image file. In the next phase, sub-words of all words are recognized and global features for each word such as number of its sub-words, number of peaks in vertical projection profile, number and position of the complementary characters and loops within each sub-word are extracted. Contour tracing plays the most important role in the phase of feature extraction.