OCR Post-processing Using Weighted Finite-State Transducers

Authors:
Rafael Llobet;Jose-Ramon Cerdan-Navarro;Juan-Carlos Perez-Cortes;Joaquim Arlandis
Affiliations:
-;-;-;-
Venue:
ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Year:
2010

Citing 0
Cited 2

Beyond myopic inference in big data pipelines

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Normalizing historical orthography for OCR historical documents using LSTM

Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A new approach for Stochastic Error-Correcting Language Modeling based on Weighted Finite-State Transducers (WFSTs) is proposed as a method to post-process the results of an optical character recognizer (OCR). Instead of using the recognized string as an input to the transducer, in our approach the complete set of OCR hypotheses, a sequence of vectors of a posteriori class probabilities, is used to build a WFST that is then composed with independent WFSTs for the error and language models. This combines the practical advantages of a de-coupled (OCR + post-processor) model with the full power of an integrated model.