An automatic linking service of document images reducing the effects of OCR errors with latent semantics

  • Authors:
  • Renato F. Bulcão-Neto;José Camacho-Guerrero;Álvaro Barreiro;Javier Parapar;Alessandra A. Macedo

  • Affiliations:
  • Innolution Sist. de Informática, Ribeirão Preto-SP, Brazil;Innolution Sist. de Informática, Ribeirão Preto-SP, Brazil;University of A Coruña, A Coruña, Spain;University of A Coruña, A Coruña, Spain;Universidade de São Paulo, Ribeirão Preto-SP, Brazil

  • Venue:
  • Proceedings of the 2010 ACM Symposium on Applied Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Robust Information Retrieval (IR) systems have been demanded due to the widespread and multipurpose use of document images, and the high number of document images repositories available nowadays. This paper presents a novel approach to support the automatic generation of relationships among document images by exploiting Latent Semantic Indexing (LSI) and Optical Character Recognition (OCR). The LinkDI service extracts and indexes document images content, obtains its latent semantics, and defines relationships among images as hyperlinks. LinkDI was experimented with document images repositories, and its performance was evaluated by comparing the quality of the relationships created among textual documents and among their respective document images. Results show the feasibility of LinkDI relating OCR output with high degradation.