Stealth works: emulating memory errors

  • Authors:
  • Musfiq Rahman;Bruce R. Childers;Sangyeun Cho

  • Affiliations:
  • Computer Science Department, University of Pittsburgh, Pittsburgh, PA;Computer Science Department, University of Pittsburgh, Pittsburgh, PA;Computer Science Department, University of Pittsburgh, Pittsburgh, PA

  • Venue:
  • RV'10 Proceedings of the First international conference on Runtime verification
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A study of Google's data center revealed that the incidence of main memory errors is surprisingly high. These errors can lead to application and system corruption, impacting reliability. The high error rate is an indication that new resiliency techniques will be vital in future memories. To develop such approaches, a framework is needed to conduct flexible and repeatable experiments. This paper describes such a framework, StealthWorks, to facilitate research on software resilience by behaviorally emulating memory errors in a live system. We illustrate it to study program tolerance to random errors and in the development of a new software technique to continuously test memory for errors.