Statistical Machine Translation as a Language Model for Handwriting Recognition

  • Authors:
  • Jacob Devlin;Matin Kamali;Krishna Subramanian;Rohit Prasad;Prem Natarajan

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • ICFHR '12 Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

When performing handwriting recognition on natural language text, the use of a word-level language model (LM) is known to significantly improve recognition accuracy. The most common type of language model, the n-gram model, decomposes sentences into short, overlapping chunks. In this paper, we propose a new type of language model which we use in addition to the standard n-gram LM. Our new model uses the likelihood score from a statistical machine translation system as a reranking feature. In general terms, we automatically translate each OCR hypothesis into another language, and then create a feature score based on how "difficult" it was to perform the translation. Intuitively, the difficulty of translation correlates with how well-formed the input sentence is. In an Arabic handwriting recognition task, we were able to obtain an 0.4\% absolute improvement to word error rate (WER) on top of a powerful 5-gram LM.