Using a hidden Markov model to transcribe handwritten bushman texts

  • Authors:
  • Kyle Williams;Hussein Suleman

  • Affiliations:
  • University of Cape Town, Cape Town, South Africa;University of Cape Town, Cape Town, South Africa

  • Venue:
  • Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

The Bushman texts in the Bleek and Lloyd Collection contain complex diacritics that make automatic transcription difficult. Transcriptions of these texts would allow for enhanced digital library services to be created for interacting with the collection. In this study, an investigation into automatic transcription of the Bushman texts was performed using the popular method of using a Hidden Markov Model for text line recognition. The results show that while this technique may be well suited to well-constrained and understood scripts, its application to more complex scripts introduces a number of difficulties that need to be overcome.