Copy Detection Systems for Digital Documents

  • Authors:
  • Douglas M. Campbell;Wendy R. Chen;Randy D. Smith

  • Affiliations:
  • -;-;-

  • Venue:
  • ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Partial or total duplication of document content is common to large digital libraries. In this paper, we present a copy detection system to automate the detection of duplication in digital documents. The system we present is sentence-based and makes three contributions: it proposes an intuitive definition of similarity between documents; it produces the distribution of overlap that exists between overlapping documents; it is resistant to inaccuracy due to large variations in document size. We report the results of several experiments that illustrate the behavior and functionality of the system.