Dynamic routing of data stream tuples among parallel query plan running on multi-core processors
Distributed and Parallel Databases
Online black-box failure prediction for mission critical distributed systems
SAFECOMP'12 Proceedings of the 31st international conference on Computer Safety, Reliability, and Security
Hi-index | 0.00 |
In this paper, we present a new online failure forecast system to achieve predictive failure management for fault-tolerant data stream processing. Different from previous reactive or proactive approaches, predictive failure management employs failure forecast to perform informed and just-in-time preventive actions on abnormal components only. We employ stream-based online learning methods to continuously classify runtime operator state into normal, alert, or failure, based on collected feature streams. We have implemented the online failure forecast system as part of the IBM System S stream processing system. Our experiments show that the on-line failure forecast system can achieve good prediction accuracy for a range of stream processing software failures, while imposing low overhead to the stream system.