Robust Retrieval of Noisy Text

  • Authors:
  • Dan Lopresti

  • Affiliations:
  • -

  • Venue:
  • ADL '96 Proceedings of the 3rd International Forum on Research and Technology Advances in Digital Libraries
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we examine the effects of simulated OCR errors on Boolean query models for information retrieval. We show that even relatively small amounts of such noise can have a significant impact. To address this issue, we formulate new variants of the traditional models by combining two classic paradigms for dealing with imprecise data: approximate string matching and fuzzy logic. Using a recall/precision analysis of an experiment involving nearly 60 million query evaluations, we demonstrate that the new fuzzy retrieval methods are generally more robust than their "sharp" counterparts.