An Evaluation of Information Retrieval Accuracy with Simulated OCR Output

  • Authors:
  • W. B. Croft;S. Harding;K. Taghva;J. Borsack

  • Affiliations:
  • -;-;-;-

  • Venue:
  • An Evaluation of Information Retrieval Accuracy with Simulated OCR Output
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

Optical Character Recognition (OCR) is a critical part of many text-based applications. Although some commercial systems use the output from OCR devices to index documents without editing, there is very little quantitative data on the impact of OCR errors on the accuracy of a text retrieval system. Because of the difficulty of constructing test collections to obtain this data, we have carried out evaluations using simulated OCR output on a variety of databases. The results show that high quality OCR devices have little effect on the accuracy of retrieval, but low quality devices used with databases of short documents can result in significant degradation.