Contextual word recognition using probabilistic relaxation labeling
Pattern Recognition
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
A Rational Design for a Weighted Finite-State Transducer Library
WIA '97 Revised Papers from the Second International Workshop on Implementing Automata
Design of a linguistic postprocessor using variable memory length Markov models
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Stochastic Error-Correcting Parsing for OCR Post-Processing
ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 4
The surprise language exercises
ACM Transactions on Asian Language Information Processing (TALIP)
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
The Bible and multilingual optical character recognition
Communications of the ACM - 3d hard copy
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Greedy decoding for statistical machine translation in almost linear time
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A generative probabilistic OCR model for NLP applications
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
An improved error model for noisy channel spelling correction
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Improved statistical alignment models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
When is an embedded MT system "good enough" for filtering?
NAACL-ANLP-EMTS '00 Proceedings of the 2000 NAACL-ANLP Workshop on Embedded machine translation systems - Volume 5
A hierarchical phrase-based model for statistical machine translation
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Effect of OCR-errors on the transformation of semi-structured text data into relational database
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Non-interactive OCR post-correction for giga-scale digitization projects
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Rejection threshold estimation for an unknown language model in an OCR task
SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition
Hi-index | 0.00 |
We present a lexicon-free post-processing method for optical character recognition (OCR), implemented using weighted finite state machines. We evaluate the technique in a number of scenarios relevant for natural language processing, including creation of new OCR capabilities for low density languages, improvement of OCR performance for a native commercial system, acquisition of knowledge from a foreign-language dictionary, creation of a parallel text, and machine translation from OCR output.