Adaptive event prediction strategy with dynamic time window for large-scale HPC systems

  • Authors:
  • Ana Gainaru;Franck Cappello;Joshi Fullop;Stefan Trausan-Matu;William Kramer

  • Affiliations:
  • UIUC, NCSA, Urbana, IL and UPB, Bucharest, Romania;INRIA, France and UIUC, Urbana, IL;UIUC, NCSA, Urbana, IL;UPB, Bucharest, Romania;UIUC, NCSA, Urbana, IL

  • Venue:
  • SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we analyse messages generated by different HPC large-scale systems in order to extract sequences of correlated events which we lately use to predict the normal and faulty behaviour of the system. Our method uses a dynamic window strategy that is able to find frequent sequences of events regardless on the time delay between them. Most of the current related research narrows the correlation extraction to fixed and relatively small time windows that do not reflect the whole behaviour of the system. The generated events are in constant change during the lifetime of the machine. We consider that it is important to update the sequences at runtime by applying modifications after each prediction phase according to the forecast's accuracy and the difference between what was expected and what really happened. Our experiments show that our analysing system is able to predict around 60% of events with a precision of around 85% at a lower event granularity than before.