Efficient storage of virtual machine images

  • Authors:
  • Roland Schwarzkopf;Matthias Schmidt;Mathias Rüdiger;Bernd Freisleben

  • Affiliations:
  • University of Marburg, Marburg, Germany;University of Marburg, Marburg, Germany;University of Marburg, Marburg, Germany;University of Marburg, Marburg, Germany

  • Venue:
  • Proceedings of the 3rd workshop on Scientific Cloud Computing Date
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Allowing users to build custom virtual machines as execution environments for their tasks provides flexibility for users and providers of Infrastructure-as-a-Service Clouds or virtualized Grid computing environments. On the downside of this flexibility are the increasing storage requirements for virtual machines. This problem is further exacerbated if version histories of virtual machines are kept to facilitate reproducibility of scientific results. Additionally, the simplicity of virtual machine creation provided by corresponding tools invites users to create multiple virtual machines for different purposes, further increasing their numbers. However, the traditional way of storing virtual machines as image files does not scale well with an increasing number of virtual machines. Several approaches have been proposed to solve this problem, each with its own drawbacks. In this paper, the Marvin Image Store (MIS) is presented to efficiently store a large number of Linux virtual machine images including their version history, independent of the distribution and the type of file system. The MIS minimizes the space required to retain images by importing them into its repository using a file based deduplication technique. Layered virtual machine images are used to reduce the time to import (updated) images and to reassemble them from the compositional manifests stored in the MIS. Furthermore, the possibility to directly mount stored images can skip the reassembly process completely. Experimental results indicate that the storage requirements can be reduced by up to 94% compared to the original images. The import of layered virtual machine images is up to 78% faster than the import of regular virtual machine images, and the export is up to 84% faster.