Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
A low-bandwidth network file system
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
ADSM: a multi-platform, scalable, backup and archive mass storage system
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Farsite: federated, available, and reliable storage for an incompletely trusted environment
ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Deep Store: An Archival Storage System Architecture
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Improving duplicate elimination in storage systems
ACM Transactions on Storage (TOS)
Exploring patterns of social commonality among file directories at work
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Alternatives for detecting redundancy in storage systems data
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
TAPER: tiered approach for eliminating redundancy in replica synchronization
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Shark: scaling file servers via cooperative caching
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Randomized Protocols for Duplicate Elimination in Peer-to-Peer Storage Systems
IEEE Transactions on Parallel and Distributed Systems
An architecture for internet data transfer
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
A comparison of file system workloads
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
A five-year study of file-system metadata
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Fast, inexpensive content-addressed storage in foundation
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Efficient detection of large-scale redundancy in enterprise file systems
ACM SIGOPS Operating Systems Review
Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Sparse indexing: large scale, inline deduplication using sampling and locality
FAST '09 Proccedings of the 7th conference on File and storage technologies
HYDRAstor: a Scalable Secondary Storage
FAST '09 Proccedings of the 7th conference on File and storage technologies
Multi-level comparison of data deduplication in a backup scenario
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
R-ADMAD: high reliability provision for large-scale de-duplication archival storage systems
Proceedings of the 23rd international conference on Supercomputing
Cumulus: Filesystem backup to the cloud
ACM Transactions on Storage (TOS)
ChunkStash: speeding up inline storage deduplication using flash memory
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
A study of practical deduplication
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Tradeoffs in scalable data routing for deduplication clusters
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Venti: a new approach to archival storage
FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
Integrating portable and distributed storage
FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
Building a high-performance deduplication system
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
CABdedupe: A Causality-Based Deduplication Performance Booster for Cloud Backup Services
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Online optimization for scheduling preemptable tasks on IaaS cloud systems
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Due to the relatively low bandwidth of WAN that supports cloud backup services and the increasing amount of backed-up data stored at service providers, the deduplication scheme used in the cloud backup environment must remove the redundant data for backup operations to reduce backup times and storage costs and for restore operations to reduce restore times. In this paper, we propose SAFE, a source deduplication framework for efficient cloud backup and restore operations. SAFE consists of three salient features, (1) Hybrid Deduplication, combining the global file-level and local chunk-level deduplication to achieve an optimal tradeoff between the deduplication efficiency and overhead to achieve a short backup time; (2) Semantic-aware Elimination, exploiting file semantics to narrow the search space for the redundant data in hybrid deduplication process to reduce the deduplication overhead; and (3) Unmodified Data Removal, removing the files and data chunks that are kept intact from data transmission for some restore operations. Through extensive experiments driven by real-world datasets, the SAFE framework is shown to maintain a much higher deduplication efficiency/overhead ratio than existing solutions, shortening the backup time by an average of 38.7 %, and reduce the restore time by a ratio of up to 9.7 : 1.