Venti: A New Approach to Archival Storage
FAST '02 Proceedings of the Conference on File and Storage Technologies
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
WOW: wise ordering for writes - combining spatial and temporal locality in non-volatile caches
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you?
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Prefetching with adaptive cache culling for striped disk arrays
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Sparse indexing: large scale, inline deduplication using sampling and locality
FAST '09 Proccedings of the 7th conference on File and storage technologies
Windows Azure Platform
Cryptanalysis of the tiger hash function
ASIACRYPT'07 Proceedings of the Advances in Crypotology 13th international conference on Theory and application of cryptology and information security
PacketShader: a GPU-accelerated software router
Proceedings of the ACM SIGCOMM 2010 conference
A GPU accelerated storage system
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
STOW: a spatially and temporally optimized write caching algorithm
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
ChunkStash: speeding up inline storage deduplication using flash memory
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
MN-Mate: Resource Management of Manycores with DRAM and Nonvolatile Memories
HPCC '10 Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications
The Hadoop Distributed File System
MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
A study of practical deduplication
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
A fast approach for parallel deduplication on multicore processors
Proceedings of the 2011 ACM Symposium on Applied Computing
Multithread Content Based File Chunking System in CPU-GPGPU Heterogeneous Architecture
CCP '11 Proceedings of the 2011 First International Conference on Data Compression, Communications and Processing
Hi-index | 0.00 |
Data deduplication has been an effective way to eliminate redundant data mainly for backup storage systems. Since the recent primary storage systems in cloud services are expected to have the redundancy, the deduplication technique can also bring significant cost saving for the primary storage. However, the primary storage system requires high performance requirement about several GBs per second. Most conventional deduplication techniques targeted the performance requirement of 200-300MB/s. In an attempt to achieve a high performance storage deduplication system at the primary storage, we thoroughly analyze the performance bottleneck of previous deduplication systems to enhance the system to meet the requirement of the primary storage. The new performance bottleneck of deduplication in the primary storage lies on not only key-value store lookup, also computation for data segmentation and fingerprinting due to recent technology improvement of flash devices such as SSD. To overcome the bottlenecks, we propose a new deduplication system utilizing GPGPU. Our proposed system, termed GHOST, includes the followings to offload and optimize the deduplication processing in GPGPU: (1) In-Host Data Cache, (2) Destage-aware Data offloading to GPGPU and (3) In-GPGPU Table Cache of key-value store. These techniques improve the offloaded deduplication processing about 10-20% on the reasonable workload of the primary storage compared to the naive approach. Our proposed deduplication system can achieve 1.5GB/s in maximum which is about 5 times of the deduplication systems used CPU only.