Measuring the Effects of OCR Errors on Similarity Linking

  • Authors:
  • Andreas Myka;Ulrich Güntzer

  • Affiliations:
  • -;-

  • Venue:
  • ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

The vector-space model offers an easy and robust model for Information Retrieval. Thereby, the similarities between queries and documents as well as the similarities between documents themselves are of importance. Document similarities may be used in order to generate links between documents that lead users from one document to related ones. Studies have shown that the vector-space model is robust in the context of OCR-processing if manually constructed queries are used. However, it is not clear whether this model, if used for hypertext construction, is robust with regard to data corruption as caused by OCR engines. In this paper, we describe the performance of automatic hypertext construction, based on the vector-space model, with regard to three different measures: the number of overtakings within the used rankings, the accumulated distance of a document's position within the rankings and a comparison based on recall-precision graphs.