Fault injection-based assessment of partial fault tolerance in stream processing applications
Proceedings of the 5th ACM international conference on Distributed event-based system
Distributed middleware reliability and fault tolerance support in system S
Proceedings of the 5th ACM international conference on Distributed event-based system
Hi-index | 0.00 |
This paper describes a modeling framework for evaluating the impact of faults on the output of streaming applications. Our model is based on three abstractions: stream operators, stream connections, and tuples. By composing these abstractions within a Stochastic Activity Network, we allow the modeling of complete applications. We consider faults that lead to data loss and to silent data corruption (SDC). Our framework captures how faults originating in one operator propagate to other operators down the stream processing graph. We demonstrate the extensibility of our framework by evaluating three different fault tolerance techniques: checkpointing, partial graph replication, and full graph replication. We show that under crashes that lead to data loss, partial graph replication has a great advantage in maintaining the accuracy of the application output when compared to checkpointing. We also show that SDC can break the no data duplication guarantees of a full graph replication-based fault tolerance technique.