Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Summary cache: a scalable wide-area Web cache sharing protocol
Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
A protocol-independent technique for eliminating redundant network traffic
Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
A low-bandwidth network file system
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Venti: A New Approach to Archival Storage
FAST '02 Proceedings of the Conference on File and Storage Technologies
WWW '03 Proceedings of the 12th international conference on World Wide Web
Deep Store: An Archival Storage System Architecture
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Farsite: federated, available, and reliable storage for an incompletely trusted environment
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Optimizing the migration of virtual computers
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
TAPER: tiered approach for eliminating redundancy in replica synchronization
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Design, implementation, and evaluation of duplicate transfer detection in HTTP
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Finding similar files in a large file system
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Demystifying data deduplication
Proceedings of the ACM/IFIP/USENIX Middleware '08 Conference Companion
SCAN-Lite: enterprise-wide analysis on the cheap
Proceedings of the 4th ACM European conference on Computer systems
Sparse indexing: large scale, inline deduplication using sampling and locality
FAST '09 Proccedings of the 7th conference on File and storage technologies
HYDRAstor: a Scalable Secondary Storage
FAST '09 Proccedings of the 7th conference on File and storage technologies
Cumulus: filesystem backup to the cloud
FAST '09 Proccedings of the 7th conference on File and storage technologies
A performance evaluation and examination of open-source erasure coding libraries for storage
FAST '09 Proccedings of the 7th conference on File and storage technologies
The design of a similarity based deduplication system
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
The effectiveness of deduplication on virtual machine disk images
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Multi-level comparison of data deduplication in a backup scenario
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
International Journal of High Performance Computing Applications
Cumulus: Filesystem backup to the cloud
ACM Transactions on Storage (TOS)
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
FastAD: an authenticated directory for billions of objects
ACM SIGOPS Operating Systems Review
Using transparent compression to improve SSD-based I/O caches
Proceedings of the 5th European conference on Computer systems
Hermes: clustering users in large-scale e-mail services
Proceedings of the 1st ACM symposium on Cloud computing
I/O Deduplication: Utilizing content similarity to improve I/O performance
ACM Transactions on Storage (TOS)
A GPU accelerated storage system
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
I/O deduplication: utilizing content similarity to improve I/O performance
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
HydraFS: a high-throughput file system for the HYDRAstor content-addressable storage system
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Bimodal content defined chunking for backup streams
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Cheap and large CAMs for high performance data-intensive networked systems
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Decentralized deduplication in SAN cluster file systems
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
ChunkStash: speeding up inline storage deduplication using flash memory
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Rethinking deduplication scalability
HotStorage'10 Proceedings of the 2nd USENIX conference on Hot topics in storage and file systems
FlashStore: high throughput persistent key-value store
Proceedings of the VLDB Endowment
Reliability analysis of deduplicated and erasure-coded storage
ACM SIGMETRICS Performance Evaluation Review
High throughput data redundancy removal algorithm with scalable performance
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Real-time approximate Range Motif discovery & data redundancy removal algorithm
Proceedings of the 14th International Conference on Extending Database Technology
A study of practical deduplication
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Tradeoffs in scalable data routing for deduplication clusters
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Capo: recapitulating storage for virtual desktops
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Leveraging value locality in optimizing NAND flash-based SSDs
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Improving throughput for small disk requests with proximal I/O
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
ACM Transactions on Storage (TOS)
A driver-layer caching policy for removable storage devices
ACM Transactions on Storage (TOS)
PRESIDIO: A Framework for Efficient Archival Data Storage
ACM Transactions on Storage (TOS)
Anchor-driven subchunk deduplication
Proceedings of the 4th Annual International Conference on Systems and Storage
SkimpyStash: RAM space skimpy key-value store on flash-based storage
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
VMFlock: virtual machine co-migration for the cloud
Proceedings of the 20th international symposium on High performance distributed computing
Data deduplication system for supporting multi-mode
ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part I
Building a high-performance deduplication system
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Don't thrash: how to cache your hash on flash
HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
Italian for beginners: the next steps for SLO-based management
HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
ViDeDup: an application-aware framework for video de-duplication
HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
Secure deduplication on mobile devices
Proceedings of the 2011 Workshop on Open Source and Design of Communication
What's the difference?: efficient set reconciliation without prior context
Proceedings of the ACM SIGCOMM 2011 conference
Better security for deterministic public-key encryption: the auxiliary-input setting
CRYPTO'11 Proceedings of the 31st annual conference on Advances in cryptology
An efficient multi-tier tablet server storage architecture
Proceedings of the 2nd ACM Symposium on Cloud Computing
DeFFS: Duplication-eliminated flash file system
Computers and Electrical Engineering
A study of practical deduplication
ACM Transactions on Storage (TOS)
GHOST: GPGPU-offloaded high performance storage I/O deduplication for primary storage system
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
File routing middleware for cloud deduplication
Proceedings of the 2nd International Workshop on Cloud Computing Platforms
Transparent Online Storage Compression at the Block-Level
ACM Transactions on Storage (TOS)
Analysis of Workload Behavior in Scientific and Historical Long-Term Data Repositories
ACM Transactions on Storage (TOS)
Live deduplication storage of virtual machine images in an open-source cloud
Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Characteristics of backup workloads in production systems
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
WAN optimized replication of backup datasets using stream-informed delta compression
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Power consumption in enterprise-scale backup storage systems
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Shredder: GPU-accelerated incremental storage and computation
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
iDedup: latency-aware, inline data deduplication for primary storage
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Content-aware load balancing for distributed backup
LISA'11 Proceedings of the 25th international conference on Large Installation System Administration
Incremental deterministic public-key encryption
EUROCRYPT'12 Proceedings of the 31st Annual international conference on Theory and Applications of Cryptographic Techniques
TBF: a high-efficient query mechanism in de-duplication backup system
GPC'12 Proceedings of the 7th international conference on Advances in Grid and Pervasive Computing
Delta compressed and deduplicated storage using stream-informed locality
HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
Generating realistic datasets for deduplication analysis
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Primary data deduplication-large scale study and system design
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Don't thrash: how to cache your hash on flash
Proceedings of the VLDB Endowment
Reducing impact of data fragmentation caused by in-line deduplication
Proceedings of the 5th Annual International Systems and Storage Conference
Insights for data reduction in primary storage: a practical analysis
Proceedings of the 5th Annual International Systems and Storage Conference
Practical perfect hashing in nearly optimal space
Information Systems
WAN-optimized replication of backup datasets using stream-informed delta compression
ACM Transactions on Storage (TOS)
A study on data deduplication in HPC storage systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Droplet: A Distributed Solution of Data Deduplication
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Live deduplication storage of virtual machine images in an open-source cloud
Proceedings of the 12th International Middleware Conference
Space savings and design considerations in variable length deduplication
ACM SIGOPS Operating Systems Review
A scalable inline cluster deduplication framework for big data protection
Proceedings of the 13th International Middleware Conference
Evaluating the feasibility of using memory content similarity to improve system resilience
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Data deduplication in a hybrid architecture for improving write performance
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
GPFS-SNC: an enterprise storage framework for virtual-machine clouds
IBM Journal of Research and Development
Proceedings of the 6th International Systems and Storage Conference
Rangoli: space management in deduplication environments
Proceedings of the 6th International Systems and Storage Conference
Block locality caching for data deduplication
Proceedings of the 6th International Systems and Storage Conference
A scalable deduplication and garbage collection engine for incremental backup
Proceedings of the 6th International Systems and Storage Conference
Proceedings of the 8th International Conference on Network and Service Management
RevDedup: a reverse deduplication storage system optimized for reads to latest backups
Proceedings of the 4th Asia-Pacific Workshop on Systems
SAFE: A Source Deduplication Framework for Efficient Cloud Backup Services
Journal of Signal Processing Systems
Dynamic Synchronous/Asynchronous Replication
ACM Transactions on Storage (TOS)
Read-Performance Optimization for Deduplication-Based Storage Systems in the Cloud
ACM Transactions on Storage (TOS)
Content-based chunk placement scheme for decentralized deduplication on distributed file systems
ICCSA'13 Proceedings of the 13th international conference on Computational Science and Its Applications - Volume 1
Efficiently storing virtual machine backups
HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems
Low-cost data deduplication for virtual machine backup in cloud storage
HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems
Triple-A: a Non-SSD based autonomic all-flash array for high performance storage systems
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Improving deduplication techniques by accelerating remainder calculations
Discrete Applied Mathematics
Memory efficient sanitization of a deduplicated storage system
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Concurrent deletion in a distributed content-addressable storage system with global deduplication
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
File recipe compression in data deduplication systems
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Improving restore speed for backup systems that use inline chunk-based deduplication
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Migratory compression: coarse-grained data reordering to improve compressibility
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
A novel approach to data deduplication over the engineering-oriented cloud systems
Integrated Computer-Aided Engineering
Hi-index | 0.00 |
Disk-based deduplication storage has emerged as the new-generation storage system for enterprise data protection to replace tape libraries. Deduplication removes redundant data segments to compress data into a highly compact form and makes it economical to store backups on disk instead of tape. A crucial requirement for enterprise data protection is high throughput, typically over 100 MB/sec, which enables backups to complete quickly. A significant challenge is to identify and eliminate duplicate data segments at this rate on a low-cost system that cannot afford enough RAM to store an index of the stored segments and may be forced to access an on-disk index for every input segment. This paper describes three techniques employed in the production Data Domain deduplication file system to relieve the disk bottleneck. These techniques include: (1) the Summary Vector, a compact in-memory data structure for identifying new segments; (2) Stream-Informed Segment Layout, a data layout method to improve on-disk locality for sequentially accessed segments; and (3) Locality Preserved Caching, which maintains the locality of the fingerprints of duplicate segments to achieve high cache hit ratios. Together, they can remove 99% of the disk accesses for deduplication of real world workloads. These techniques enable a modern two-socket dual-core system to run at 90% CPU utilization with only one shelf of 15 disks and achieve 100 MB/sec for single-stream throughput and 210 MB/sec for multi-stream throughput.