Offline recognition of omnifont Arabic text using the HMM ToolKit (HTK)

  • Authors:
  • M. S. Khorsheed

  • Affiliations:
  • King AbdulAziz City for Science and Technology (KACST), P.O. Box 6086, Riyadh 11442, Saudi Arabia

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2007

Quantified Score

Hi-index 0.10

Visualization

Abstract

This paper presents a cursive Arabic text recognition system. The system decomposes the document image into text line images and extracts a set of simple statistical features from a narrow window which is sliding a long that text line. It then injects the resulting feature vectors to the Hidden Markov Model Toolkit (HTK). HTK is a portable toolkit for speech recognition system. The proposed system is applied to a data corpus which includes Arabic text of more than 600 A4-size sheets typewritten in multiple computer-generated fonts.