Group 4 Compressed Document Matching

Authors:
Dar-Shyang Lee;Jonathan J. Hull
Affiliations:
-;-
Venue:
DAS '98 Selected Papers from the Third IAPR Workshop on Document Analysis Systems: Theory and Practice
Year:
1998

Citing 1
Cited 0

Using Character Shape Coding for Information Retrieval

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Numerous approaches, including textual, structural and featural, for detecting duplicate documents have been investigated. Considering document images are usually stored and transmitted in compressed forms, it is advantageous to perform document matching directly on the compressed data. A two-stage process for matching Group 4 compressed document images is presented. In the coarse matching stage, ranked hypotheses axe generated based on compression bit profile correlations. These candidates are further evaluated using a feature set similar to the pass codes. Multiple descriptors based on local arrangement of the feature points are constructed for efficient indexing into the database. Performance of the algorithm on the UW database is discussed.