Document Filtering for Fast Approximate String Matching of Errorneous Text

  • Authors:
  • Affiliations:
  • Venue:
  • ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: It is important to utilize retrospective documents. OCR is the most widely applied technology for this purpose; however, error-tolerant methods are essential for utilizing OCR-processed documents. This paper discusses a filtering problem for OCR-processed documents that enables the handling of large numbers of OCR-processed documents in an error-tolerant way. We propose a systematic index design method for filtering and show that the filtering method speeds up by about 360 times for a database consisting of about two million records, with little decrease in accuracy.