Preprocessing and Structural Features for a Multi-Fonts Arabic/Persian OCR

Authors:
Madana Kavianifar;Adnan Amin
Affiliations:
-;-
Venue:
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Year:
1999

Citing 0
Cited 2

A Gibbsian Kohonen Network for Online Arabic Character Recognition

ISVC '08 Proceedings of the 4th International Symposium on Advances in Visual Computing, Part II
A robust free size OCR for omni-font persian/arabic printed document using combined MLP/SVM

CIARP'05 Proceedings of the 10th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

English and Chinese are languages, which have tremendously attracted interests of character recognition researchers. In contrast, research in the field of character recognition for Arabic scripts face major problems which is mainly related to the unique characteristics of these two like being cursive, multiple shapes of one character in different positions in a word and connectivity of characters on the baseline. The proposed work consists of three major phases. After digitizing the text, the original image is transformed in to a gray scale image using a 300-dpi scanner. Different steps of preprocessing such as noise reduction, global thresholding, skewing, recognizing connected components and grouping them are then applied on the image file. In the next phase, sub-words of all words are recognized and global features for each word such as number of its sub-words, number of peaks in vertical projection profile, number and position of the complementary characters and loops within each sub-word are extracted. Contour tracing plays the most important role in the phase of feature extraction.