kMemvisor: flexible system wide memory mirroring in virtual environments

Authors:
Bin Wang;Zhengwei Qi;Haibing Guan;Haoliang Dong;Wei Sun;Yaozu Dong
Affiliations:
School of Software, Shanghai Jiao Tong University, Shanghai, China;School of Software, Shanghai Jiao Tong University, Shanghai, China;School of Software, Shanghai Jiao Tong University, Shanghai, China;School of Software, Shanghai Jiao Tong University, Shanghai, China;School of Software, Shanghai Jiao Tong University, Shanghai, China;Intel China Software Center, Shanghai, China
Venue:
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Year:
2013

Citing 28
Cited 0

The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Constructions of optical FIFO queues

IEEE/ACM Transactions on Networking (TON) - Special issue on networking and information theory
Live migration of virtual machines

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Special Feature: Semiconductor Memory Reliability with Error Detecting and Correcting Codes

Computer
Supporting superpage allocation without additional hardware support

Proceedings of the 7th international symposium on Memory management
Remus: high availability via asynchronous virtual machine replication

NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
The definitive guide to the xen hypervisor

The definitive guide to the xen hypervisor
Scalable high performance main memory system using phase-change memory technology

Proceedings of the 36th annual international symposium on Computer architecture
A study of a KVM-based cluster for grid computing

Proceedings of the 47th Annual Southeast Regional Conference
Error-correcting codes for semiconductor memory applications: a state-of-the-art review

IBM Journal of Research and Development
Use ECP, not ECC, for hard failures in resistive memories

Proceedings of the 37th annual international symposium on Computer architecture
ECC-on-SIMM test challenges

ITC'94 Proceedings of the 1994 international conference on Test
DRAM errors in the wild: a large-scale field study

Communications of the ACM
The design of a practical system for fault-tolerant virtual machines

ACM SIGOPS Operating Systems Review
A rising tide lifts all boats: how memory error prediction and prevention can help with virtualized system longevity

HotDep'10 Proceedings of the Sixth international conference on Hot topics in system dependability
SAFER: Stuck-At-Fault Error Recovery for Memories

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Algorithm-based recovery for iterative methods without checkpointing

Proceedings of the 20th international symposium on High performance distributed computing
Performance and energy modeling for live migration of virtual machines

Proceedings of the 20th international symposium on High performance distributed computing
Improving PCM Endurance with Randomized Address Remapping in Hybrid Memory System

CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
Poster: a tunable, software-based DRAM error detection and correction library for HPC

Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
Pay-As-You-Go: low-overhead hard-error correction for phase change memories

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A tunable, software-based DRAM error detection and correction library for HPC

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Singleton: system-wide page deduplication in virtual environments

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Fault tolerant parallel data-intensive algorithms

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Surviving failures in bandwidth-constrained datacenters

ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
A study of DRAM failures in the field

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Memvisor: Application Level Memory Mirroring via Binary Translation

CLUSTER '12 Proceedings of the 2012 IEEE International Conference on Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today's commercial cloud service providers require the availability with an annual uptime percentage at least 99.95\%. While memory errors become norms instead of exceptions with the increasing memory's density and capacity in cloud applications. Thus, uncorrected errors from DRAM can be a significant source of system downtime. To address this increasingly important concern, both hardware and software memory mirroring technologies are studied nowadays to provide memory high availability. However, hardware solutions like mirror memory, which uses doubled chip, need dedicated and costly peripheral hardware. While existing software approaches, i.e., virtual machine's checkpoint technology, reduce the expense but incur the high overhead in practical usage. In this paper, we present a novel system called \emph{k}Memvisor to provide system-wide high availability memory mirroring. It is a software approach achieving flexible multi-granularity memory mirroring via virtualization and binary translation technology. Specifically, kMemvisor first creates backup space of the same size of the specified memory for applications or virtual machines. We can flexibly set memory areas to be mirrored or not mirrored from application level to system-wide. Then, all memory write instructions in the native memory space are captured and instrumented by mirror memory write instructions to synchronize the data in backup space. Furthermore, this instruction level memory synchronization reduces backup overhead and lowers the probability of data loss compared with traditional software approaches. So kMemvisor could use data from the backup space to recover when memory failures happen. The results show that kMemvisor causes 55% overhead in the worst case of system-wide high availability and 30% average for the real world applications, which outperforms the state-of-the-art software approaches even in the worst case.