Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Deadlock detection in distributed databases
ACM Computing Surveys (CSUR)
The design and implementation of a log-structured file system
ACM Transactions on Computer Systems (TOCS)
Efficient checkpointing on MIMD architectures
Efficient checkpointing on MIMD architectures
Necessary and Sufficient Conditions for Consistent Global Snapshots
IEEE Transactions on Parallel and Distributed Systems
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Managing update conflicts in Bayou, a weakly connected replicated storage system
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Fail-stop processors: an approach to designing fault-tolerant computing systems
ACM Transactions on Computer Systems (TOCS)
System architecture directions for networked sensors
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Dynamic fine-grained localization in Ad-Hoc networks of sensors
Proceedings of the 7th annual international conference on Mobile computing and networking
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Message Logging: Pessimistic, Optimistic, Causal, and Optimal
IEEE Transactions on Software Engineering
Log-Based Recovery for Nested Transactions
VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
MPICH-V: toward a scalable fault tolerant MPI for volatile nodes
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
The nesC language: A holistic approach to networked embedded systems
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
A VP-Accordant Checkpointing Protocol Preventing Useless Checkpoints
SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
Distributed system fault tolerance using message logging and checkpointing
Distributed system fault tolerance using message logging and checkpointing
Manetho: fault tolerance in distributed systems using rollback-recovery and process replication
Manetho: fault tolerance in distributed systems using rollback-recovery and process replication
TOSSIM: accurate and scalable simulation of entire TinyOS applications
Proceedings of the 1st international conference on Embedded networked sensor systems
Hood: a neighborhood abstraction for sensor networks
Proceedings of the 2nd international conference on Mobile systems, applications, and services
The flooding time synchronization protocol
SenSys '04 Proceedings of the 2nd international conference on Embedded networked sensor systems
An analysis of a large scale habitat monitoring application
SenSys '04 Proceedings of the 2nd international conference on Embedded networked sensor systems
Medians and beyond: new aggregation techniques for sensor networks
SenSys '04 Proceedings of the 2nd international conference on Embedded networked sensor systems
Region streams: functional macroprogramming for sensor networks
DMSN '04 Proceeedings of the 1st international workshop on Data management for sensor networks: in conjunction with VLDB 2004
TAG: a Tiny AGgregation service for Ad-Hoc sensor networks
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Declarative routing: extensible routing with declarative queries
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Lightweight detection and classification for wireless sensor networks in realistic environments
Proceedings of the 3rd international conference on Embedded networked sensor systems
Programming sensor networks using abstract regions
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Microreboot — A technique for cheap recovery
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Collaborative in-network processing for target tracking
EURASIP Journal on Applied Signal Processing
Macro-programming wireless sensor networks using Kairos
DCOSS'05 Proceedings of the First IEEE international conference on Distributed Computing in Sensor Systems
Reliable and efficient programming abstractions for wireless sensor networks
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Resource management aspects for sensor network software
Proceedings of the 4th workshop on Programming languages and operating systems
HotPower'08 Proceedings of the 2008 conference on Power aware computing and systems
Hi-index | 0.00 |
Wireless sensor networks consist of a system of distributed sensors embedded in the physical world, and promise to allow observation of previously unobservable phenomena. Since they are exposed to unpredictable environments, sensor-network applications must handle a wide variety of faults: software errors, node and link failures, and network partitions. The code to manually detect and recover from faults crosscuts the entire application, is tedious to implement correctly and efficiently, and is fragile in the face of program modifications. We investigate language support for modularly managing faults. Our insight is that such support can be naturally provided as an extension to existing "macroprogramming" systems for sensor networks. In such a system, a programmer describes a sensor network application as a centralized program; a compiler then produces equivalent node-level programs. We describe a simple checkpoint API for macroprograms, which can be automatically implemented in a distributed fashion across the network. We also describe declarative annotations that allow programmers to specify checkpointing strategies at a higher level of abstraction. We have implemented our approach in the Kairos macroprogramming system. Experiments show it to improve application availability by an order of magnitude and incur low messaging overhead.