A novel minimal Arabic script for preparing databases and benchmarks for Arabic text recognition research

Authors:
Husni A. Al-Muhtaseb;Sabri A. Mahmoud;Rami S. Qahwaji
Affiliations:
Information and Computer Science Department, King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia;Information and Computer Science Department, King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia;Electronic Imaging and Media Communications Department, University of Bradford, Bradford, UK
Venue:
WAV'09 Proceedings of the 3rd WSEAS international symposium on Wavelets theory and applications in applied mathematics, signal processing & modern science
Year:
2009

Citing 10
Cited 0

Survey and bibliography of Arabic optical text recognition

Signal Processing
Features-based decision aggregation in modular neural network classifiers

Pattern Recognition Letters - Special issue on pattern recognition in practice VI
Omnifont and Unlimited-Vocabulary OCR for English and Arabic

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Combination of Local and Global Vision Modelling for Arabic Handwritten Words Recognition

IWFHR '02 Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR'02)
A Data Base for Arabic Handwritten Text Recognition Research

IWFHR '02 Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR'02)
A Neuro-Heuristic Approach for Segmenting Handwritten Arabic Text

AICCSA '01 Proceedings of the ACS/IEEE International Conference on Computer Systems and Applications
Synthetic Data for Arabic OCR System Development

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
HMM Based Approach for Handwritten Arabic Word Recognition Using the IFN/ENIT- Database

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Arabic Handwriting Recognition Competition

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Classifiers combination and syntax analysis for Arabic literal amount recognition

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work presents a minimal Arabic script that may be used in training and testing of Arabic text recognition systems. Collecting handwritten samples from different writers to build handwritten text databases which may be used for benchmarking Arabic text recognition systems is another application. The suggested Arabic script covers the different shapes of Arabic alphabet in all positions (viz. standalone, initial, medial, and terminal). The frequency of each shape in the minimal text is designed to be the minimal possible. The suggested script is novel from different perspectives. A writer may participate with only three lines of meaningful Arabic text to cover all possible alphabet shapes, a total of 125 shapes. Collecting scripts from different writers provide evenly distributed letter frequencies that assure enough samples for all character shapes. This enables proper training resulting in more accuracy in the recognition phase. The same can be stated for printed Arabic text. This is especially useful when using large number of features with classifiers that require large number of samples for each category. Hidden Markov Models and Neural networks are two examples of these classifiers. In addition, the paper presents statistical analysis of Arabic corpora for estimating the number of occurrences of different shapes of Arabic alphabets in large corpora. The frequency of Arabic alphabet usage was utilized in enhancing the search for the minimal Arabic text.