Rejection threshold estimation for an unknown language model in an OCR task

Authors:
Joaquim Arlandis;Juan-Carlos Perez-Cortes;J. Ramon Navarro-Cerdan;Rafael Llobet
Affiliations:
Instituto Tecnológico de Informática, Universitat Politècnica de València, València, Spain;Instituto Tecnológico de Informática, Universitat Politècnica de València, València, Spain;Instituto Tecnológico de Informática, Universitat Politècnica de València, València, Spain;Instituto Tecnológico de Informática, Universitat Politècnica de València, València, Spain
Venue:
SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition
Year:
2010

Citing 13
Cited 0

Inference of k-Testable Languages in the Strict Sense and Application to Syntactic Pattern Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient Error-Correcting Viterbi Parsing

IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate String Matching

ACM Computing Surveys (CSUR)
Stochastic Error-Correcting Parsing for OCR Post-Processing

ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 4
Confidence modeling for handwriting recognition: algorithms and applications

International Journal on Document Analysis and Recognition
Precision-recall operating characteristic (P-ROC) curves in imprecise environments

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 04
Confidence estimation for NLP applications

ACM Transactions on Speech and Language Processing (TSLP)
OCR post-processing for low density languages

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Rejection strategies for offline handwritten text line recognition

Pattern Recognition Letters
A Novel Rejection Measurement in Handwritten Numeral Recognition Based on Linear Discriminant Analysis

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Balancing error and supervision effort in interactive-predictive handwriting recognition

Proceedings of the 15th international conference on Intelligent user interfaces
Adaptive threshold estimation via extreme value theory

IEEE Transactions on Signal Processing
On determining the radar threshold for non-Gaussian processes from experimental data

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

In an OCR post-processing task, a language model is used to find the best transformation of the OCR hypothesis into a string compatible with the language. The cost of this transformation is used as a confidence value to reject the strings that are less likely to be correct, and the error rate of the accepted strings should be strictly controlled by the user. In this work, the expected error rate distribution of an unknown language model is estimated from a training set composed of known language models. This means that after building a new language model, the user should be able to automatically "fix" the expected error rate at an acceptable level instead of having to deal with an arbitrary threshold.