NV-process: a fault-tolerance process model based on non-volatile memory

  • Authors:
  • Xu Li;Kai Lu;Xiaoping Wang;Xu Zhou

  • Affiliations:
  • National University of Defense Technology, China;National University of Defense Technology, China;National University of Defense Technology, China;National University of Defense Technology, China

  • Venue:
  • Proceedings of the Asia-Pacific Workshop on Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reliability wall is one of the most challenging problems for next generation High Performance Computing (HPC) systems. Traditional system design adopts extra fault tolerance mechanism. However, the cost of fault tolerance mechanism itself may incur huge cost, so as to decrease the utilization ratio of the HPC system. To address this problem, we present NV-process, a fault-tolerance process model based on NVRAM. NV-process instances run in a self-contained way in NVRAM, thus to survive across operating system reboot. NV-process provides an elegant way for the applications to tolerate system crashes. We implement a prototype system of NV-process based on Linux and analyze the advantages over traditional fault tolerant mechanism for future HPC applications.