Group 4 Compressed Document Matching

  • Authors:
  • Dar-Shyang Lee;Jonathan J. Hull

  • Affiliations:
  • -;-

  • Venue:
  • DAS '98 Selected Papers from the Third IAPR Workshop on Document Analysis Systems: Theory and Practice
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

Numerous approaches, including textual, structural and featural, for detecting duplicate documents have been investigated. Considering document images are usually stored and transmitted in compressed forms, it is advantageous to perform document matching directly on the compressed data. A two-stage process for matching Group 4 compressed document images is presented. In the coarse matching stage, ranked hypotheses axe generated based on compression bit profile correlations. These candidates are further evaluated using a feature set similar to the pass codes. Multiple descriptors based on local arrangement of the feature points are constructed for efficient indexing into the database. Performance of the algorithm on the UW database is discussed.