Signature extraction for overlap detection in documents

  • Authors:
  • Raphael A. Finkel;Arkady Zaslavsky;Krisztián Monostori;Heinz Schmidt

  • Affiliations:
  • University of Kentucky, Lexington, KY;Monash University, Melbourne, Australia;Monash University, Melbourne, Australia;Monash University, Melbourne, Australia

  • Venue:
  • ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Easy access to the Web has led to increased potential for students cheating on assignments by plagiarising others' work. By the same token, Web-based tools offer the potential for instructors to check submitted assignments for signs of plagiarism. Overlap-detection tools are easy to use and accurate in plagiarism detection, so they can be an excellent deterrent to plagiarism. Documents can overlap for other reasons, too: Old documents are superseded, and authors summarize previous work identically in several papers. Overlap-detection tools can pinpoint interconnections in a corpus of documents and could be used in search engines.We describe a web-accessible text registry based on signature extraction. We extract a small but diagnostic signature from each registered text for permanent storage and comparison against other stored signatures. This comparison allows us to estimate the amount of overlap between pairs of documents, although the total time required is linear in the total size of the documents. We compare our algorithm with several alternatives and present both efficiency and accuracy results.