Rejection threshold estimation for an unknown language model in an OCR task

  • Authors:
  • Joaquim Arlandis;Juan-Carlos Perez-Cortes;J. Ramon Navarro-Cerdan;Rafael Llobet

  • Affiliations:
  • Instituto Tecnológico de Informática, Universitat Politècnica de València, València, Spain;Instituto Tecnológico de Informática, Universitat Politècnica de València, València, Spain;Instituto Tecnológico de Informática, Universitat Politècnica de València, València, Spain;Instituto Tecnológico de Informática, Universitat Politècnica de València, València, Spain

  • Venue:
  • SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In an OCR post-processing task, a language model is used to find the best transformation of the OCR hypothesis into a string compatible with the language. The cost of this transformation is used as a confidence value to reject the strings that are less likely to be correct, and the error rate of the accepted strings should be strictly controlled by the user. In this work, the expected error rate distribution of an unknown language model is estimated from a training set composed of known language models. This means that after building a new language model, the user should be able to automatically "fix" the expected error rate at an acceptable level instead of having to deal with an arbitrary threshold.