A low-bandwidth network file system
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Finding similar files in large document repositories
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Improving duplicate elimination in storage systems
ACM Transactions on Storage (TOS)
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Demystifying data deduplication
Proceedings of the ACM/IFIP/USENIX Middleware '08 Conference Companion
Hi-index | 0.00 |
Chunking algorithms play an important role in hash-based data de-duplication systems. The Basic Sliding Window (BSW) algorithm is the first prototype of a content-based chunking algorithm that can handle most types of data. The Two Thresholds Two Divisors (TTTD) algorithm was proposed to improve the BSW algorithm by controlling the chunk-size variations. We conducted a series of systematic experiments to evaluate the performances of these two algorithms. We also proposed a new improvement for the TTTD algorithm. Our new approach reduced about 6% of the running time and 50% of the large-sized chunks, and also brought other significant benefits.