A case for tracking and exploiting inter-node and intra-node memory content sharing in virtualized large-scale parallel systems

Authors:
Lei Xia;Peter A. Dinda
Affiliations:
Northwestern University, Evanston, IL, USA;Northwestern University, Evanston, IL, USA
Venue:
Proceedings of the 6th international workshop on Virtualization Technologies in Distributed Computing Date
Year:
2012

Citing 19
Cited 3

Adaptive incremental checkpointing for massively parallel systems

Proceedings of the 18th annual international conference on Supercomputing
Memory resource management in VMware ESX server

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Optimizing the migration of virtual computers

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Scalability, fidelity, and containment in the potemkin virtual honeyfarm

Proceedings of the twentieth ACM symposium on Operating systems principles
A case for high performance computing with virtual machines

Proceedings of the 20th annual international conference on Supercomputing
Live migration of virtual machines

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Virtual Clusters on the Fly - Fast, Scalable, and Flexible Installation

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Subtleties in tolerating correlated failures in wide-area storage systems

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Proactive fault tolerance for HPC with Xen virtualization

Proceedings of the 21st annual international conference on Supercomputing
Memory buddies: exploiting page sharing for smart colocation in virtualized data centers

Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
The case for RAMClouds: scalable high-performance storage entirely in DRAM

ACM SIGOPS Operating Systems Review
Difference engine: harnessing memory redundancy in virtual machines

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Satori: enlightened page sharing

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Live gang migration of virtual machines

Proceedings of the 20th international symposium on High performance distributed computing
VMFlock: virtual machine co-migration for the cloud

Proceedings of the 20th international symposium on High performance distributed computing
Shrinker: improving live migration of virtual clusters over WANs with distributed data deduplication and content-based addressing

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Exploiting Data Similarity to Reduce Memory Footprints

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Evaluating the viability of process replication reliability for exascale systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis

An empirical study of memory sharing in virtual machines

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Evaluating the feasibility of using memory content similarity to improve system resilience

Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
CMD: classification-based memory deduplication through page access characteristics

Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

In virtualized large-scale parallel systems scientific workloads consist of numerous processes running across many virtual nodes. Their memory footprint is massive, and this has consequences for services that enhance performance, reliability, or power. We argue that a service that dynamically tracks the sharing of memory content, both within individual nodes, and across nodes, can simplify and enhance the implementation of such services. For example, leveraging content sharing could significantly reduce the size of a checkpoint of a group of nodes. As another example, it could speed VM migration by allowing the reconstruction of a VM's memory from multiple source VMs. Finally, a service that improves reliability by introducing memory redundancy could leverage existing content sharing to minimize the memory costs of any particular level of redundancy. We argue that both intra- and inter-node memory content sharing is common in parallel applications, supporting this claim by a detailed study of both kinds of sharing, at different scales, different granularities, and different times for a range of applications and application benchmarks. We then describe the high level approach we are taking to design and implement a distributed, VMM-based system that can efficiently and scalably identify and track such sharing with low overhead.