Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
Snoop: an expressive event specification language for active databases
Data & Knowledge Engineering
Distributed systems (2nd Ed.)
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
High-Availability Algorithms for Distributed Stream Processing
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Fault-tolerance in the Borealis distributed stream processing system
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Borealis-R: a replication-transparent stream processing system for wide-area monitoring applications
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Speculative out-of-order event processing with software transaction memory
Proceedings of the second international conference on Distributed event-based systems
Fault-tolerant stream processing using a distributed, replicated file system
Proceedings of the VLDB Endowment
Fast and Highly-Available Stream Processing over Wide Area Networks
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems
ICDCS '09 Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems
Reliable complex event detection for pervasive computing
Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
CEC: Continuous eventual checkpointing for data stream processing operators
DSN '11 Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems&Networks
Supporting Strong Reliability for Distributed Complex Event Processing Systems
HPCC '11 Proceedings of the 2011 IEEE International Conference on High Performance Computing and Communications
Processing flows of information: From data stream to complex event processing
ACM Computing Surveys (CSUR)
MigCEP: operator migration for mobility driven distributed complex event processing
Proceedings of the 7th ACM international conference on Distributed event-based systems
MigCEP: operator migration for mobility driven distributed complex event processing
Proceedings of the 7th ACM international conference on Distributed event-based systems
Hi-index | 0.00 |
Reliability is of critical importance to many applications involving distributed event processing systems. Especially the use of stateful operators makes it challenging to provide efficient recovery from failures and to ensure consistent event streams. Even during failure-free execution, state-of-the-art methods for achieving reliability incur significant overhead at run-time concerning computational resources, event traffic, and event detection time. This paper proposes a novel method for rollback-recovery that allows for recovery from multiple simultaneous operator failures, but eliminates the need for persistent checkpoints. Thereby, the operator state is preserved in \emph{savepoints} at points in time when its execution solely depends on the state of incoming event streams which are reproducible by predecessor operators. We propose an expressive event processing model to determine savepoints and algorithms for their coordination in a distributed operator network. Evaluations show that very low overhead at failure-free execution in comparison to other approaches is achieved.