Models and Algorithms for Duplicate Document Detection

  • Authors:
  • Daniel P. Lopresti

  • Affiliations:
  • -

  • Venue:
  • ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces a framework for clarifying and formalizing the duplicate document detection problem. Four distinct models are presented, each with a corresponding algorithm for its solution derived from the realm of approximate string matching. The robustness of these techniques is demonstrated through a set of experiments using data reflecting real-world degradation effects.