Winnowing: local algorithms for document fingerprinting

  • Authors:
  • Saul Schleimer;Daniel S. Wilkerson;Alex Aiken

  • Affiliations:
  • University of Illinois, Chicago;Computer Science Division, UC Berkeley;Computer Science Division, UC Berkeley

  • Venue:
  • Proceedings of the 2003 ACM SIGMOD international conference on Management of data
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Digital content is for copying: quotation, revision, plagiarism, and file sharing all create copies. Document fingerprinting is concerned with accurately identifying copying, including small partial copies, within large sets of documents.We introduce the class of local document fingerprinting algorithms, which seems to capture an essential property of any finger-printing technique guaranteed to detect copies. We prove a novel lower bound on the performance of any local algorithm. We also develop winnowing, an efficient local fingerprinting algorithm, and show that winnowing's performance is within 33% of the lower bound. Finally, we also give experimental results on Web data, and report experience with MOSS, a widely-used plagiarism detection service.