Comparison of texts streams in the presence of mild adversaries

Authors:
Michael Malkin;Ramarathnam Venkatesan
Affiliations:
Gates Building, Stanford, CA;One Microsoft Way, Redmond, WA
Venue:
ACSW Frontiers '05 Proceedings of the 2005 Australasian workshop on Grid computing and e-research - Volume 44
Year:
2005

Citing 10
Cited 3

Introduction to algorithms

Introduction to algorithms
Building a scalable and accurate copy detection mechanism

Proceedings of the first ACM international conference on Digital libraries
Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
Foundations of statistical natural language processing

Foundations of statistical natural language processing
A small approximately min-wise independent family of hash functions

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Cryptography: Theory and Practice

Cryptography: Theory and Practice
Completeness and Robustness Properties of Min-Wise Independent Permutations

RANDOM-APPROX '99 Proceedings of the Third International Workshop on Approximation Algorithms for Combinatorial Optimization Problems: Randomization, Approximation, and Combinatorial Algorithms and Techniques
On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Winnowing: local algorithms for document fingerprinting

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Detecting digital copyright violations on the internet

Detecting digital copyright violations on the internet

Identifying and resolving hidden text salting

IEEE Transactions on Information Forensics and Security
Obfuscating plagiarism detection: vulnerabilities and solutions

Proceedings of the 12th International Conference on Computer Systems and Technologies
Web text data mining for building large scale language modelling corpus

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text sifting is a method of quickly and securely identifying documents for database searching, copy detection, duplicate email detection and plagiarism detection. A small amount of text is extracted from a document using hash functions and is used as the document's fingerprint. We build upon previous work by Broder et al. [4,5] and Heintze [8], specifically addressing a certain set of attacks that we discovered to be very powerful against previous systems. We achieve robustness against these attacks with a new selection process. We also give theoretical and experimental results for these and other attacks on text sifting functions.