Developing typewritten Arabic corpus with multi-fonts (TRACOM)

Authors:
Mohammed S. Khorsheed;Khaled M. Alhazmi;Adil M. Asiri
Affiliations:
King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia;King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia;King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia
Venue:
Proceedings of the International Workshop on Multilingual OCR
Year:
2009

Citing 6
Cited 0

Design and implementation of automatic indexing for information retrieval with Arabic documents

Journal of the American Society for Information Science
Offline recognition of omnifont Arabic text using the HMM ToolKit (HTK)

Pattern Recognition Letters
Recognition of writer-independent off-line handwritten Arabic (Indian) numerals using hidden Markov models

Signal Processing
Recognition of off-line printed Arabic text using Hidden Markov Models

Signal Processing
HMM-based system for recognizing words in historical Arabic manuscript

International Journal of Robotics and Automation
Mono-font cursive arabic text recognition using speech recognition system

SSPR'06/SPR'06 Proceedings of the 2006 joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Amongst the obstacles that have played an important role in delaying the character recognition systems for Arabic language as compared to other languages such as Latin and Chinese is the absence of support utilities such as a language corpus and electronic dictionaries. This paper aims to develop a diverse corpus of scanned page images with the corresponding ground-truth text and description files. This data is a TypewRitten Arabic Corpus with Multi-fonts and referred to as TRACOM. TRACOM may also serve as a benchmark for assessing the performance of Arabic text recognition system. The corpus includes data from the following sources: computer-generated documents, newspapers, magazines, books. The document images are coupled with the equivalent text i.e., ground-truth.