Using content-derived names for configuration management
Proceedings of the 1997 symposium on Software reusability
A low-bandwidth network file system
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Venti: A New Approach to Archival Storage
FAST '02 Proceedings of the Conference on File and Storage Technologies
Reclaiming Space from Duplicate Files in a Serverless Distributed File System
ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Deep Store: An Archival Storage System Architecture
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Pastiche: making backup cheap and easy
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Linux Journal
Efficient archival data storage
Efficient archival data storage
QEMU, a fast and portable dynamic translator
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Alternatives for detecting redundancy in storage systems data
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
An analysis of compare-by-hash
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Shark: scaling file servers via cooperative caching
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Proceedings of the 4th ACM international workshop on Storage security and survivability
Difference engine: harnessing memory redundancy in virtual machines
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Experiences with content addressable storage and virtual disks
WIOV'08 Proceedings of the First conference on I/O virtualization
Finding collisions in the full SHA-1
CRYPTO'05 Proceedings of the 25th annual international conference on Advances in Cryptology
Moving from logical sharing of guest OS to physical sharing of deduplication on virtual machine
HotSec'10 Proceedings of the 5th USENIX conference on Hot topics in security
A study of practical deduplication
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Building a high-performance deduplication system
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
A study of practical deduplication
ACM Transactions on Storage (TOS)
An empirical analysis of similarity in virtual machine images
Proceedings of the Middleware 2011 Industry Track Workshop
Live deduplication storage of virtual machine images in an open-source cloud
Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Efficient storage of virtual machine images
Proceedings of the 3rd workshop on Scientific Cloud Computing Date
Singleton: system-wide page deduplication in virtual environments
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Generating realistic datasets for deduplication analysis
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Insights for data reduction in primary storage: a practical analysis
Proceedings of the 5th Annual International Systems and Storage Conference
A study on data deduplication in HPC storage systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Droplet: A Distributed Solution of Data Deduplication
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Live deduplication storage of virtual machine images in an open-source cloud
Proceedings of the 12th International Middleware Conference
Rangoli: space management in deduplication environments
Proceedings of the 6th International Systems and Storage Conference
Proceedings of the 8th International Conference on Network and Service Management
RevDedup: a reverse deduplication storage system optimized for reads to latest backups
Proceedings of the 4th Asia-Pacific Workshop on Systems
Read-Performance Optimization for Deduplication-Based Storage Systems in the Cloud
ACM Transactions on Storage (TOS)
Leveraging collaborative content exchange for on-demand VM multi-deployments in iaas clouds
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
DupLESS: server-aided encryption for deduplicated storage
SEC'13 Proceedings of the 22nd USENIX conference on Security
Efficiently storing virtual machine backups
HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems
File recipe compression in data deduplication systems
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Operational experiences with disk imaging in a multi-tenant datacenter
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Virtualization is becoming widely deployed in servers to efficiently provide many logically separate execution environments while reducing the need for physical servers. While this approach saves physical CPU resources, it still consumes large amounts of storage because each virtual machine (VM) instance requires its own multi-gigabyte disk image. Moreover, existing systems do not support ad hoc block sharing between disk images, instead relying on techniques such as overlays to build multiple VMs from a single "base" image. Instead, we propose the use of deduplication to both reduce the total storage required for VM disk images and increase the ability of VMs to share disk blocks. To test the effectiveness of deduplication, we conducted extensive evaluations on different sets of virtual machine disk images with different chunking strategies. Our experiments found that the amount of stored data grows very slowly after the first few virtual disk images if only the locale or software configuration is changed, with the rate of compression suffering when different versions of an operating system or different operating systems are included. We also show that fixed-length chunks work well, achieving nearly the same compression rate as variable-length chunks. Finally, we show that simply identifying zero-filled blocks, even in ready-to-use virtual machine disk images available online, can provide significant savings in storage.