Surviving sensor network software faults

  • Authors:
  • Yang Chen;Omprakash Gnawali;Maria Kazandjieva;Philip Levis;John Regehr

  • Affiliations:
  • University of Utah, Salt Lake City, UT, USA;University of Southern California, Los Angeles, CA, USA;Stanford University, Stanford University, CA, USA;Stanford University, Stanford, CA, USA;University of Utah, Salt Lake City, UT, USA

  • Venue:
  • Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe Neutron, a version of the TinyOS operating system that efficiently recovers from memory safety bugs. Where existing schemes reboot an entire node on an error, Neutron's compiler and runtime extensions divide programs into recovery units and reboot only the faulting unit. The TinyOS kernel itself is a recovery unit: a kernel safety violation appears to applications as the processor being unavailable for 10-20 milliseconds. Neutron further minimizes safety violation cost by supporting "precious" state that persists across reboots. Application data, time synchronization state, and routing tables can all be declared as precious. Neutron's reboot sequence conservatively checks that precious state is not the source of a fault before preserving it. Together, recovery units and precious state allow Neutron to reduce a safety violation's cost to time synchronization by 94% and to a routing protocol by 99.5%. Neutron also protects applications from losing data. Neutron provides this recovery on the very limited resources of a tiny, low-power microcontroller.