Latent Style Model: Discovering writing styles for calligraphy works
Journal of Visual Communication and Image Representation
Evaluating models of latent document semantics in the presence of OCR errors
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Audio lifelog search system using a topic model for reducing recognition errors
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Bounding the probability of error for high precision optical character recognition
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Measuring contextual fitness using error contexts extracted from the Wikipedia revision history
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Content level access to digital library of India pages
Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing
On handling textual errors in latent document modeling
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Modern optical character recognition software relies on human interaction to correct misrecognized charac- ters. Even though the software often reliably identifies low-confidence output, the simple language and vocabu- lary models employed are insufficient to automatically cor- rect mistakes. This paper demonstrates that topic models, which automatically detect and represent an article's se- mantic context, reduces error by 7% over a global word distribution in a simulated OCR correction task. Detecting and leveraging context in this manner is an important step towards improving OCR.