Automatic text processing
Probabilistic models in information retrieval
The Computer Journal - Special issue on information retrieval
Word sense disambiguation and information retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Results of applying probabilistic IR to OCR text
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Metadata for integrating speech documents in a text retrieval system
ACM SIGMOD Record
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion using local and global document analysis
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieving spoken documents by combining multiple index sources
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic Retrieval of OCR Degraded Text Using N-Grams
ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Automatic Hypertext Conversion of Paper Document Collections
Selected Papers from the Digital Libraries Workshop on Digital Libraries: Current Issues
Post-processing of OCR results for automatic indexing
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
Assessing the retrieval effectiveness of a speech retrieval system by simulating recognition errors
HLT '94 Proceedings of the workshop on Human Language Technology
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Speech retrieval based on automatic indexing
MIRO'95 Proceedings of the Final conference on Multimedia Information Retrieval
An Investigation of Mixed-Media Information Retrieval
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Information Processing and Management: an International Journal
Document image analysis for active reading
SADPI '07 Proceedings of the 2007 international workshop on Semantically aware document processing and indexing
Effect of OCR error correction on Arabic retrieval
Information Retrieval
Text Retrieval through Corrupted Queries
IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence
Efficient Language-Independent Retrieval of Printed Documents without OCR
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Information Processing and Management: an International Journal
Comparative information retrieval evaluation for scanned documents
Proceedings of the 15th WSEAS international conference on Computers
Using string comparison in context for improved relevance feedback in different text media
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Hi-index | 0.00 |
The retrieval of documents that originate from digitized and OCR-converted paper documents is an important task for modern retrieval systems. The problems that OCR errors cause for the retrieval process have been subject to research for several years now. We approach the problem from a theoretical point of view and model OCR conversion as a random experiment. Our theoretical results, which are supported by experiments, show clearly that information retrieval can cope even with many errors. It is, however, important that the documents are not too short and that recognition errors are distributed appropriately among words and documents. These results disclose that an expensive manual or automatic post-processing of OCR-converted documents usually does not make sense, but that scanning and OCR must be performed in an appropriate way and with care.