Comparison of texts streams in the presence of mild adversaries

  • Authors:
  • Michael Malkin;Ramarathnam Venkatesan

  • Affiliations:
  • Gates Building, Stanford, CA;One Microsoft Way, Redmond, WA

  • Venue:
  • ACSW Frontiers '05 Proceedings of the 2005 Australasian workshop on Grid computing and e-research - Volume 44
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text sifting is a method of quickly and securely identifying documents for database searching, copy detection, duplicate email detection and plagiarism detection. A small amount of text is extracted from a document using hash functions and is used as the document's fingerprint. We build upon previous work by Broder et al. [4,5] and Heintze [8], specifically addressing a certain set of attacks that we discovered to be very powerful against previous systems. We achieve robustness against these attacks with a new selection process. We also give theoretical and experimental results for these and other attacks on text sifting functions.