Introduction to algorithms
Building a scalable and accurate copy detection mechanism
Proceedings of the first ACM international conference on Digital libraries
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A small approximately min-wise independent family of hash functions
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Cryptography: Theory and Practice
Cryptography: Theory and Practice
Completeness and Robustness Properties of Min-Wise Independent Permutations
RANDOM-APPROX '99 Proceedings of the Third International Workshop on Approximation Algorithms for Combinatorial Optimization Problems: Randomization, Approximation, and Combinatorial Algorithms and Techniques
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Detecting digital copyright violations on the internet
Detecting digital copyright violations on the internet
Identifying and resolving hidden text salting
IEEE Transactions on Information Forensics and Security
Obfuscating plagiarism detection: vulnerabilities and solutions
Proceedings of the 12th International Conference on Computer Systems and Technologies
Web text data mining for building large scale language modelling corpus
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Hi-index | 0.00 |
Text sifting is a method of quickly and securely identifying documents for database searching, copy detection, duplicate email detection and plagiarism detection. A small amount of text is extracted from a document using hash functions and is used as the document's fingerprint. We build upon previous work by Broder et al. [4,5] and Heintze [8], specifically addressing a certain set of attacks that we discovered to be very powerful against previous systems. We achieve robustness against these attacks with a new selection process. We also give theoretical and experimental results for these and other attacks on text sifting functions.