UniFI: leveraging non-volatile memories for a unified fault tolerance and idle power management technique

  • Authors:
  • Somayeh Sardashti;David A. Wood

  • Affiliations:
  • University of Wisconsin-Madison, Madison, WI, USA;University of Wisconsin-Madison, Madison, WI, USA

  • Venue:
  • Proceedings of the 26th ACM international conference on Supercomputing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Continued technology scaling presents new challenges for system-level fault tolerance and power management. Decreasing device sizes increases the likelihood of both transient and permanent faults. Increasing device count, together with the end of Dennard scaling, makes power a critical design constraint. Techniques that seek to improve system reliability frequently use more power. Similarly, many techniques that reduce power hurt system reliability. Ideally system designers should seek out techniques that mutually benefit both fault tolerance and power management. In this paper, we develop a unified technique, called UniFI, for fault tolerance and idle power management in shared memory multi-core systems. UniFI leverages emerging non-volatile memory technologies to provide an energy-efficient lightweight checkpointing technique. In addition to tolerating a large class of faults, UniFI's frequent checkpoints permit near-instant transition to a deep sleep mode to reduce idle power. UniFI incurs very low performance and energy overheads during fault-free execution--less than 2%--while taking checkpoints every 0.1ms. For typical server workloads (such as DNS), UniFI reduces average power by 82% by shutting off during idle periods.