Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Opening black boxes: using semantic information to combat virtual machine image sprawl
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
The effectiveness of deduplication on virtual machine disk images
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
The Eucalyptus Open-Source Cloud-Computing System
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Efficient similarity estimation for systems exploiting data redundancy
INFOCOM'10 Proceedings of the 29th conference on Information communications
Decentralized deduplication in SAN cluster file systems
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
A study of practical deduplication
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Exploiting similarity for multi-source downloads using file handprints
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Virtual machine images as structured data: the mirage image library
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Leveraging feature models to configure virtual appliances
Proceedings of the 2nd International Workshop on Cloud Computing Platforms
Efficient storage of virtual machine images
Proceedings of the 3rd workshop on Scientific Cloud Computing Date
Small is big: functionally partitioned file caching in virtualized environments
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
A study on data deduplication in HPC storage systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security
Hi-index | 0.00 |
To efficiently design deduplication, caching and other management mechanisms for virtual machine (VM) images in Infrastructure as a Service (IaaS) clouds, it is essential to understand the level and pattern of similarity among VM images in real world IaaS environments. This paper empirically analyzes the similarity within and between 525 VM images from a production IaaS cloud. Besides presenting the overall level of content similarity, we have also discovered interesting insights on multiple factors affecting the similarity pattern, including the image creation time and the location in the image's address space. Moreover, we found that similarities between pairs of images exhibit high variance, and an image is very likely to be more similar to a small subset of images than all other images in the repository. Groups of data chunks often appear in the same image. These image and chunk "clusters" can help predict future data accesses, and therefore provide important hints to cache placement, eviction, and prefetching.