Online Failure Forecast for Fault-Tolerant Data Stream Processing

  • Authors:
  • Xiaohui Gu;Spiros Papadimitriou;Philip S. Yu;Shu-Ping Chang

  • Affiliations:
  • North Carolina State University, Raleigh, NC. gu@csc.ncsu.edu;IBM T.J. Watson Research Center, Hawthorne, NY. spapadim@us.ibm.com;University of Illinois at Chicago, Chicago, IL. psyu@cs.uic.edu;IBM T.J. Watson Research Center, Hawthorne, NY. spchang@us.ibm.com

  • Venue:
  • ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a new online failure forecast system to achieve predictive failure management for fault-tolerant data stream processing. Different from previous reactive or proactive approaches, predictive failure management employs failure forecast to perform informed and just-in-time preventive actions on abnormal components only. We employ stream-based online learning methods to continuously classify runtime operator state into normal, alert, or failure, based on collected feature streams. We have implemented the online failure forecast system as part of the IBM System S stream processing system. Our experiments show that the on-line failure forecast system can achieve good prediction accuracy for a range of stream processing software failures, while imposing low overhead to the stream system.