PRUN: Eliminating Information Redundancy for Large Scale Data Backup System

  • Authors:
  • Youjip Won;Rakie Kim;Jongmyeong Ban;Jungpil Hur;Sangkyu Oh;Jangsun Lee

  • Affiliations:
  • -;-;-;-;-;-

  • Venue:
  • ICCSA '08 Proceedings of the 2008 International Conference on Computational Sciences and Its Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work, we develop novel backup system, PRUN, for massive scale data storage. PRUN aims at improving the backup latency and storage overhead of backup via effectively eliminating information redundancy in the files. PRUN eliminates intra-file and inter-file information redundancy. PRUN consists of client module and server module. PRUN consists of three key technical ingredients: redundancy detection, fingerprint manager, and chunk manager. File chunking for redundancy detection is the most time consuming task in backup. For efficient file chunking, we develop incremental modulo-K algorithm which enables us to improve the file chunking time significantly. We perform various experiment to measure the overhead of each tasks in backup operation and to examine the efficiency of redundancy elimination. Incremental modulo-K reduces the file chunking latency by approximately 60%. Redundancy elimination scheme can reduce the storage requirement of backup by 80% when we backup different minor versions of Linux 2.6 kernel source.