Measuring the Effects of OCR Errors on Similarity Linking

Authors:
Andreas Myka;Ulrich Güntzer
Affiliations:
-;-
Venue:
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Year:
1997

Citing 0
Cited 3

Information Retrieval from Documents: A Survey

Information Retrieval
Feature string-based intelligent information retrieval from Tamil document images

International Journal of Computer Applications in Technology
The impact of OCR accuracy and feature transformation on automatic text classification

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The vector-space model offers an easy and robust model for Information Retrieval. Thereby, the similarities between queries and documents as well as the similarities between documents themselves are of importance. Document similarities may be used in order to generate links between documents that lead users from one document to related ones. Studies have shown that the vector-space model is robust in the context of OCR-processing if manually constructed queries are used. However, it is not clear whether this model, if used for hypertext construction, is robust with regard to data corruption as caused by OCR engines. In this paper, we describe the performance of automatic hypertext construction, based on the vector-space model, with regard to three different measures: the number of overtakings within the used rankings, the accumulated distance of a document's position within the rankings and a comparison based on recall-precision graphs.