Evaluation of model-based retrieval effectiveness with OCR text
ACM Transactions on Information Systems (TOIS)
Effects of OCR errors on ranking and feedback using the vector space model
Information Processing and Management: an International Journal
The String-to-String Correction Problem
Journal of the ACM (JACM)
Prediction of OCR accuracy using simple image features
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Named entity extraction from noisy input: speech and OCR
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A maximum entropy approach to identifying sentence boundaries
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Improving information extraction by modeling errors in speech recognizer output
HLT '01 Proceedings of the first international conference on Human language technology research
Optical character recognition errors and their effects on natural language processing
Proceedings of the second workshop on Analytics for noisy unstructured text data
Successfully detecting and correcting false friends using channel profiles
Proceedings of the second workshop on Analytics for noisy unstructured text data
Tools for monitoring, visualizing, and refining collections of noisy documents
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Non-interactive OCR post-correction for giga-scale digitization projects
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Hi-index | 0.00 |
We investigate the problem of evaluating the performance of text processing algorithms on inputs that contain errors as a result of optical character recognition. A new hierarchical paradigm is proposed based on approximate string matching, allowing each stage in the processing pipeline to be tested, the error effects analyzed, and possible solutions suggested.