Locating and Recognizing Text in WWW Images
Information Retrieval
Reduction of Expanded Search Terms for Fuzzy English-Text Retrieval
ECDL '98 Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries
Probabilistic Automaton Model for Fuzzy English-Text Retrieval
ECDL '00 Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries
Indexing and retrieval of words in old documents
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Robust document image understanding technologies
Proceedings of the 1st ACM workshop on Hardcopy document processing
Ontology Guided Access to Document Images
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Hi-index | 0.00 |
In this paper, we examine the effects of simulated OCR errors on Boolean query models for information retrieval. We show that even relatively small amounts of such noise can have a significant impact. To address this issue, we formulate new variants of the traditional models by combining two classic paradigms for dealing with imprecise data: approximate string matching and fuzzy logic. Using a recall/precision analysis of an experiment involving nearly 60 million query evaluations, we demonstrate that the new fuzzy retrieval methods are generally more robust than their "sharp" counterparts.