The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
A low-bandwidth network file system
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Venti: A New Approach to Archival Storage
FAST '02 Proceedings of the Conference on File and Storage Technologies
Journal of Algorithms
Algorithms and data structures for flash memories
ACM Computing Surveys (CSUR)
Architecture-conscious hashing
DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
The Berkeley DB Book
FlashDB: dynamic self-tuning database for NAND flash
Proceedings of the 6th international conference on Information processing in sensor networks
Microhash: an efficient index structure for fash-based sensor devices
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
A flash-memory based file system
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
A log buffer-based flash translation layer using fully-associative sector translation
ACM Transactions on Embedded Computing Systems (TECS)
Communications of the ACM - Web science
BPLRU: a buffer management scheme for improving random writes in flash storage
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Design tradeoffs for SSD performance
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Proceedings of the VLDB Endowment
Online maintenance of very large random samples on flash storage
Proceedings of the VLDB Endowment
Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Sparse indexing: large scale, inline deduplication using sampling and locality
FAST '09 Proccedings of the 7th conference on File and storage technologies
HYDRAstor: a Scalable Secondary Storage
FAST '09 Proccedings of the 7th conference on File and storage technologies
FlashLogging: exploiting flash devices for synchronous logging performance
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
FAWN: a fast array of wimpy nodes
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
HydraFS: a high-throughput file system for the HYDRAstor content-addressable storage system
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Bimodal content defined chunking for backup streams
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Cheap and large CAMs for high performance data-intensive networked systems
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Decentralized deduplication in SAN cluster file systems
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
More Robust Hashing: Cuckoo Hashing with a Stash
SIAM Journal on Computing
FlashStore: high throughput persistent key-value store
Proceedings of the VLDB Endowment
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Leveraging value locality in optimizing NAND flash-based SSDs
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
SSDAlloc: hybrid SSD/RAM memory management made easy
Proceedings of the 8th USENIX conference on Networked systems design and implementation
SkimpyStash: RAM space skimpy key-value store on flash-based storage
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
SCMFS: a file system for storage class memory
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
GHOST: GPGPU-offloaded high performance storage I/O deduplication for primary storage system
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Live deduplication storage of virtual machine images in an open-source cloud
Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
WAN optimized replication of backup datasets using stream-informed delta compression
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Shredder: GPU-accelerated incremental storage and computation
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
iDedup: latency-aware, inline data deduplication for primary storage
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
A study of space reclamation on flash-based append-only storage management
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications
Primary data deduplication-large scale study and system design
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
BVSSD: build built-in versioning flash-based solid state drives
Proceedings of the 5th Annual International Systems and Storage Conference
Reducing impact of data fragmentation caused by in-line deduplication
Proceedings of the 5th Annual International Systems and Storage Conference
WAN-optimized replication of backup datasets using stream-informed delta compression
ACM Transactions on Storage (TOS)
Droplet: A Distributed Solution of Data Deduplication
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Live deduplication storage of virtual machine images in an open-source cloud
Proceedings of the 12th International Middleware Conference
Block locality caching for data deduplication
Proceedings of the 6th International Systems and Storage Conference
SCMFS: A File System for Storage Class Memory and its Extensions
ACM Transactions on Storage (TOS)
SAFE: A Source Deduplication Framework for Efficient Cloud Backup Services
Journal of Signal Processing Systems
Read-Performance Optimization for Deduplication-Based Storage Systems in the Cloud
ACM Transactions on Storage (TOS)
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Tango: distributed data structures over a shared log
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Triple-A: a Non-SSD based autonomic all-flash array for high performance storage systems
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Storage deduplication has received recent interest in the research community. In scenarios where the backup process has to complete within short time windows, inline deduplication can help to achieve higher backup throughput. In such systems, the method of identifying duplicate data, using disk-based indexes on chunk hashes, can create throughput bottlenecks due to disk I/Os involved in index lookups. RAM prefetching and bloom-filter based techniques used by Zhu et al. [42] can avoid disk I/Os on close to 99% of the index lookups. Even at this reduced rate, an index lookup going to disk contributes about 0.1msec to the average lookup time - this is about 1000 times slower than a lookup hitting in RAM. We propose to reduce the penalty of index lookup misses in RAM by orders of magnitude by serving such lookups from a flash-based index, thereby, increasing inline deduplication throughput. Flash memory can reduce the huge gap between RAM and hard disk in terms of both cost and access times and is a suitable choice for this application. To this end, we design a flash-assisted inline deduplication system using ChunkStash, a chunk metadata store on flash. ChunkStash uses one flash read per chunk lookup and works in concert with RAM prefetching strategies. It organizes chunk metadata in a log-structure on flash to exploit fast sequential writes. It uses an inmemory hash table to index them, with hash collisions resolved by a variant of cuckoo hashing. The in-memory hash table stores (2-byte) compact key signatures instead of full chunk-ids (20-byte SHA-1 hashes) so as to strike tradeoffs between RAM usage and false flash reads. Further, by indexing a small fraction of chunks per container, ChunkStash can reduce RAM usage significantly with negligible loss in deduplication quality. Evaluations using real-world enterprise backup datasets show that ChunkStash outperforms a hard disk index based inline deduplication system by 7x-60x on the metric of backup throughput (MB/sec).