Content-based chunk placement scheme for decentralized deduplication on distributed file systems

  • Authors:
  • Keonwoo Kim;Jeehong Kim;Changwoo Min;Young Ik Eom

  • Affiliations:
  • College of Information and Communication Engineering, Sungkyunkwan University, Suwon, Korea;College of Information and Communication Engineering, Sungkyunkwan University, Suwon, Korea;College of Information and Communication Engineering, Sungkyunkwan University, Suwon, Korea,Samsung Electronics Co., Ltd., Suwon, Korea;College of Information and Communication Engineering, Sungkyunkwan University, Suwon, Korea

  • Venue:
  • ICCSA'13 Proceedings of the 13th international conference on Computational Science and Its Applications - Volume 1
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The rapid growth of data size causes several problems such as storage limitation and increment of data management cost. In order to store and manage massive data, Distributed File System (DFS) is widely used. Furthermore, in order to reduce the volume of storage, data deduplication schemes are being extensively studied. The data deduplication increases the available storage capacity by eliminating duplicated data. However, deduplication process causes performance overhead such as disk I/O. In this paper, we propose a content-based chunk placement scheme to increase deduplication rate on the DFS. To avoid performance overhead caused by deduplication process, we use lessfs in each chunk server. With our design, our system performs decentralized deduplication process in each chunk server. Moreover, we use consistent hashing for chunk allocation and failure recovery. Our experimental results show that the proposed system reduces the storage space by 60% than the system without consistent hashing.