Statistical Machine Translation as a Language Model for Handwriting Recognition

Authors:
Jacob Devlin;Matin Kamali;Krishna Subramanian;Rohit Prasad;Prem Natarajan
Affiliations:
-;-;-;-;-
Venue:
ICFHR '12 Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition
Year:
2012

Citing 0
Cited 2

Separability versus prototypicality in handwritten word-image retrieval

Pattern Recognition
Improving on-line handwritten recognition in interactive machine translation

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

When performing handwriting recognition on natural language text, the use of a word-level language model (LM) is known to significantly improve recognition accuracy. The most common type of language model, the n-gram model, decomposes sentences into short, overlapping chunks. In this paper, we propose a new type of language model which we use in addition to the standard n-gram LM. Our new model uses the likelihood score from a statistical machine translation system as a reranking feature. In general terms, we automatically translate each OCR hypothesis into another language, and then create a feature score based on how "difficult" it was to perform the translation. Intuitively, the difficulty of translation correlates with how well-formed the input sentence is. In an Arabic handwriting recognition task, we were able to obtain an 0.4\% absolute improvement to word error rate (WER) on top of a powerful 5-gram LM.