Fault injection-based assessment of partial fault tolerance in stream processing applications

Authors:
Gabriela Jacques-Silva;Bugra Gedik;Henrique Andrade;Kun-Lung Wu;Ravishankar K. Iyer
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL & IBM T. J. Watson Research Center, Hawthorne, NY, USA;IBM T. J. Watson Research Center, Hawthorne, NY, USA;IBM T. J. Watson Research Center, Hawthorne, NY, USA;IBM Research, Hawthorne, NY, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA
Venue:
Proceedings of the 5th ACM international conference on Distributed event-based system
Year:
2011

Citing 25
Cited 2

Probability and statistics with reliability, queuing and computer science applications

Probability and statistics with reliability, queuing and computer science applications
Fault Injection Techniques and Tools

Computer
Joint Evaluation of Performance and Robustness of a COTS DBMS through Fault-Injection

DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
On the Emulation of Software Faults by Software Fault Injection

DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
How Fail-Stop are Faulty Programs?

FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
Approximate join processing over data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
The Effects of an ARMOR-Based SIFT Environment on the Performance and Dependability of User Applications

IEEE Transactions on Software Engineering
Load Shedding for Aggregation Queries over Data Streams

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
High-Availability Algorithms for Distributed Stream Processing

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
The 8 requirements of real-time stream processing

ACM SIGMOD Record
Supporting fault-tolerance in streaming grid applications

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Towards a dependable architecture for internet-scale sensing

HOTDEP'06 Proceedings of the 2nd conference on Hot Topics in System Dependability - Volume 2
Monitoring streams: a new class of data management applications

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Load shedding in a data stream manager

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Fault-tolerance in the borealis distributed stream processing system

ACM Transactions on Database Systems (TODS)
SPADE: the system s declarative stream processing engine

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Challenges in dependable internet-scale stream processing

Proceedings of the 2nd workshop on Dependable distributed data management
Fault-tolerant stream processing using a distributed, replicated file system

Proceedings of the VLDB Endowment
Fast and Highly-Available Stream Processing over Wide Area Networks

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Scale-Up Strategies for Processing High-Rate Data Streams in System S

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
A code generation approach to optimizing high-performance distributed data stream processing

Proceedings of the 18th ACM conference on Information and knowledge management
An empirical study of high availability in stream processing systems

Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
S4: Distributed Stream Computing Platform

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
Modeling stream processing applications for dependability evaluation

DSN '11 Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems&Networks

Deriving a unified fault taxonomy for event-based systems

Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
Building user-defined runtime adaptation routines for stream processing applications

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an experimental methodology used to evaluate the effectiveness of partial fault tolerance (PFT) techniques in data stream processing applications. Without a clear understanding of the impact of faults on the quality of the application output, applying PFT techniques in practice is not viable. We assess the impact of PFT by injecting faults into a synthetic financial engineering application running on top of IBM's stream processing middleware, System S. The application output quality degradation is evaluated via an application-specific output score function. In addition, we propose four metrics that are aimed at assessing the impact of faults in different stream operators of the application flow graph with respect to predictability and availability. These metrics help the developer to decide where in the application he should place redundant resources. We show that PFT is indeed viable, which opens the way for considerably reducing the resource consumption when compared to fully consistent replicas.