A low-bandwidth network file system
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Hibernator: helping disk arrays sleep through the winter
Proceedings of the twentieth ACM symposium on Operating systems principles
Providing High Reliability in a Minimum Redundancy Archival Storage System
MASCOTS '06 Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Fast, inexpensive content-addressed storage in foundation
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Can We Really Recover Data if Storage Subsystem Fails?
ICDCS '08 Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems
Proceedings of the VLDB Endowment
Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Sparse indexing: large scale, inline deduplication using sampling and locality
FAST '09 Proccedings of the 7th conference on File and storage technologies
The effectiveness of deduplication on virtual machine disk images
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
FAWN: a fast array of wimpy nodes
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Lithium: virtual machine storage for the cloud
Proceedings of the 1st ACM symposium on Cloud computing
A New Buffer Cache Design Exploiting Both Temporal and Content Localities
ICDCS '10 Proceedings of the 2010 IEEE 30th International Conference on Distributed Computing Systems
I/O deduplication: utilizing content similarity to improve I/O performance
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
HydraFS: a high-throughput file system for the HYDRAstor content-addressable storage system
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Bimodal content defined chunking for backup streams
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Difference engine: harnessing memory redundancy in virtual machines
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Decentralized deduplication in SAN cluster file systems
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
ChunkStash: speeding up inline storage deduplication using flash memory
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Exploiting Data Deduplication to Accelerate Live Virtual Machine Migration
CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
dedupv1: Improving deduplication throughput using solid state drives (SSD)
MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
A study of practical deduplication
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Tradeoffs in scalable data routing for deduplication clusters
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Cost effective storage using extent based dynamic tiering
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Venti: a new approach to archival storage
FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
Building a high-performance deduplication system
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
CABdedupe: A Causality-Based Deduplication Performance Booster for Cloud Backup Services
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
iDedup: latency-aware, inline data deduplication for primary storage
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Primary data deduplication-large scale study and system design
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
A study on data deduplication in HPC storage systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Improving restore speed for backup systems that use inline chunk-based deduplication
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
Data deduplication has been demonstrated to be an effective technique in reducing the total data transferred over the network and the storage space in cloud backup, archiving, and primary storage systems, such as VM (virtual machine) platforms. However, the performance of restore operations from a deduplicated backup can be significantly lower than that without deduplication. The main reason lies in the fact that a file or block is split into multiple small data chunks that are often located in different disks after deduplication, which can cause a subsequent read operation to invoke many disk IOs involving multiple disks and thus degrade the read performance significantly. While this problem has been by and large ignored in the literature thus far, we argue that the time is ripe for us to pay significant attention to it in light of the emerging cloud storage applications and the increasing popularity of the VM platform in the cloud. This is because, in a cloud storage or VM environment, a simple read request on the client side may translate into a restore operation if the data to be read or a VM suspended by the user was previously deduplicated when written to the cloud or the VM storage server, a likely scenario considering the network bandwidth and storage capacity concerns in such an environment. To address this problem, in this article, we propose SAR, an SSD (solid-state drive)-Assisted Read scheme, that effectively exploits the high random-read performance properties of SSDs and the unique data-sharing characteristic of deduplication-based storage systems by storing in SSDs the unique data chunks with high reference count, small size, and nonsequential characteristics. In this way, many read requests to HDDs are replaced by read requests to SSDs, thus significantly improving the read performance of the deduplication-based storage systems in the cloud. The extensive trace-driven and VM restore evaluations on the prototype implementation of SAR show that SAR outperforms the traditional deduplication-based and flash-based cache schemes significantly, in terms of the average response times.