Building a scalable and accurate copy detection mechanism

  • Authors:
  • Narayanan Shivakumar;Hector Garcia-Molina

  • Affiliations:
  • Department of Computer Science, Stanford, CA;Department of Computer Science, Stanford, CA

  • Venue:
  • Proceedings of the first ACM international conference on Digital libraries
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

Often, publishers are reluctant to offer valuable digital documentson the Internet for fear that they will be re-transmitted or copiedwidely. A Copy Detection Mechanism can help identify such copying.For example, publishers may register their documents with a copydetection server, and the server can then automatically checkpublic sources such as UseNet articles and Web sites for potentialillegal copies. The server can search for exact copies, and alsofor cases where significant portions of documents have been copied.In this paper we study, for the first time, the performance ofvarious copy detection mechanisms, including the disk storagerequirements, main memory requirements, response times forregistration, and response time for querying. We also contrastperformance to the accuracy of the mechanisms (how well they detectpartial copies). The results are obtained using SCAM, anexperimental server we have implemented, and a collection of 50,000netnews articles.