Declarative failure recovery for sensor networks

Authors:
Ramakrishna Gummadi;Nupur Kothari;Todd Millstein;Ramesh Govindan
Affiliations:
University of Southern California;University of Southern California;University of California, Los Angeles;University of Southern California
Venue:
Proceedings of the 6th international conference on Aspect-oriented software development
Year:
2007

Citing 33
Cited 3

Optimistic recovery in distributed systems

ACM Transactions on Computer Systems (TOCS)
Checkpointing and Rollback-Recovery for Distributed Systems

IEEE Transactions on Software Engineering - Special issue on distributed systems
Deadlock detection in distributed databases

ACM Computing Surveys (CSUR)
The design and implementation of a log-structured file system

ACM Transactions on Computer Systems (TOCS)
Efficient checkpointing on MIMD architectures

Efficient checkpointing on MIMD architectures
Necessary and Sufficient Conditions for Consistent Global Snapshots

IEEE Transactions on Parallel and Distributed Systems
Distributed snapshots: determining global states of distributed systems

ACM Transactions on Computer Systems (TOCS)
Managing update conflicts in Bayou, a weakly connected replicated storage system

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Fail-stop processors: an approach to designing fault-tolerant computing systems

ACM Transactions on Computer Systems (TOCS)
System architecture directions for networked sensors

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Dynamic fine-grained localization in Ad-Hoc networks of sensors

Proceedings of the 7th annual international conference on Mobile computing and networking
A survey of rollback-recovery protocols in message-passing systems

ACM Computing Surveys (CSUR)
Message Logging: Pessimistic, Optimistic, Causal, and Optimal

IEEE Transactions on Software Engineering
Log-Based Recovery for Nested Transactions

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
MPICH-V: toward a scalable fault tolerant MPI for volatile nodes

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
The nesC language: A holistic approach to networked embedded systems

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
A VP-Accordant Checkpointing Protocol Preventing Useless Checkpoints

SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
Distributed system fault tolerance using message logging and checkpointing

Distributed system fault tolerance using message logging and checkpointing
Manetho: fault tolerance in distributed systems using rollback-recovery and process replication

Manetho: fault tolerance in distributed systems using rollback-recovery and process replication
TOSSIM: accurate and scalable simulation of entire TinyOS applications

Proceedings of the 1st international conference on Embedded networked sensor systems
Hood: a neighborhood abstraction for sensor networks

Proceedings of the 2nd international conference on Mobile systems, applications, and services
The flooding time synchronization protocol

SenSys '04 Proceedings of the 2nd international conference on Embedded networked sensor systems
An analysis of a large scale habitat monitoring application

SenSys '04 Proceedings of the 2nd international conference on Embedded networked sensor systems
Medians and beyond: new aggregation techniques for sensor networks

SenSys '04 Proceedings of the 2nd international conference on Embedded networked sensor systems
Region streams: functional macroprogramming for sensor networks

DMSN '04 Proceeedings of the 1st international workshop on Data management for sensor networks: in conjunction with VLDB 2004
TAG: a Tiny AGgregation service for Ad-Hoc sensor networks

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Metarouting

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Declarative routing: extensible routing with declarative queries

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Lightweight detection and classification for wireless sensor networks in realistic environments

Proceedings of the 3rd international conference on Embedded networked sensor systems
Programming sensor networks using abstract regions

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Microreboot — A technique for cheap recovery

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Collaborative in-network processing for target tracking

EURASIP Journal on Applied Signal Processing
Macro-programming wireless sensor networks using Kairos

DCOSS'05 Proceedings of the First IEEE international conference on Distributed Computing in Sensor Systems

Reliable and efficient programming abstractions for wireless sensor networks

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Resource management aspects for sensor network software

Proceedings of the 4th workshop on Programming languages and operating systems
Getting things done on computational RFIDs with energy-aware checkpointing and voltage-aware scheduling

HotPower'08 Proceedings of the 2008 conference on Power aware computing and systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Wireless sensor networks consist of a system of distributed sensors embedded in the physical world, and promise to allow observation of previously unobservable phenomena. Since they are exposed to unpredictable environments, sensor-network applications must handle a wide variety of faults: software errors, node and link failures, and network partitions. The code to manually detect and recover from faults crosscuts the entire application, is tedious to implement correctly and efficiently, and is fragile in the face of program modifications. We investigate language support for modularly managing faults. Our insight is that such support can be naturally provided as an extension to existing "macroprogramming" systems for sensor networks. In such a system, a programmer describes a sensor network application as a centralized program; a compiler then produces equivalent node-level programs. We describe a simple checkpoint API for macroprograms, which can be automatically implemented in a distributed fashion across the network. We also describe declarative annotations that allow programmers to specify checkpointing strategies at a higher level of abstraction. We have implemented our approach in the Kairos macroprogramming system. Experiments show it to improve application availability by an order of magnitude and incur low messaging overhead.