Data deduplication using dynamic chunking algorithm

Authors:
Young Chan Moon;Ho Min Jung;Chuck Yoo;Young Woong Ko
Affiliations:
Dept. of Computer Engineering, Hallym University, Chuncheon, Korea;Dept. of Computer Engineering, Hallym University, Chuncheon, Korea;Dept. of Computer Science and Engineering, Korea University, Seoul, Korea;Dept. of Computer Engineering, Hallym University, Chuncheon, Korea
Venue:
ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part II
Year:
2012

Citing 4
Cited 0

A low-bandwidth network file system

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Decentralized deduplication in SAN cluster file systems

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Venti: a new approach to archival storage

FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
Data deduplication system for supporting multi-mode

ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data deduplication is widely used in storage systems to prevent duplicated data blocks. In this paper, we suggest a dynamic chunking approach using fixed-length chunking and file similarity technique. The fixed-length chunking struggles with boundary shift problem and shows poor performance when handling duplicated data files. The key idea of this work is to utilize duplicated data information in the file similarity information. We can easily find several duplicated point by comparing hash key value and file offset within file similarity information. We consider these duplicated points as a hint for starting position of chunking. With this approach, we can significantly improve the performance of data deduplication system using fixed-length chunking. In experiment result, the proposed dynamic chunking results in significant performance improvement for deduplication processing capability and shows fast processing time comparable to that of fixed length chunking.