Automatic processing of Arabic text

Authors:
Ziad Osman;Lama Hamandi;Rached Zantout;Fadi N. Sibai
Affiliations:
Electrical Engineering, Beirut Arab University, Beirut, Lebanon;Electrical and Computer Engineering, American University of Beirut, Lebanon;College of Computer and Info Sciences, Prince Sultan University, Riyadh;Computer Systems Design, College of IT, UAE University, Al Ain, UAE
Venue:
IIT'09 Proceedings of the 6th international conference on Innovations in information technology
Year:
2009

Citing 5
Cited 0

Arabic character recognition system: a statistical approach for recognizing cursive typewritten text

Pattern Recognition
Digital Image Processing

Digital Image Processing
On the Segmentation of Multi-Front Printed Uygur Scripts

ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume III-Volume 7276 - Volume 7276
Multi-Font Arabic Word Recognition Using Spectral Features

ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 4
Affixal Approach for Arabic Decomposable Vocabulary Recognition: A Validation on Printed Word in Only One Font

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic recognition of printed and handwritten documents remains an active area of research. Arabic is one of the languages that present special problems. Arabic is cursive and therefore necessitates a segmentation process to determine the boundaries of a character. Arabic characters consist of multiple disconnected parts. Dots and Diacritics are used in many Arabic characters and can appear above or below the main body of the character. In Arabic, the same letter has up to four different forms depending on where it appears in the word and depending on the letters that are adjacent to it. In this paper, a novel approach is described that recognizes Arabic script documents. The method starts by preprocessing which involves binarization, noise reduction, and thinning. The text is then segmented into separate lines. Characters are then segmented by determining bifurcation points that are near the baseline. Segmented characters are then compared to prestored templates to identify the best match. The template comparisons are based on central moments, Hu moments, and Invariant moments. The method is proven to work satisfactorily for scanned printed Arabic text. The paper concludes with a discussion of the drawbacks of the method, and a description of possible solutions.