Identifying HPC codes via performance logs and machine learning

  • Authors:
  • Orianna DeMasi;Taghrid Samak;David H. Bailey

  • Affiliations:
  • Lawrence Berkeley National Laboratory, Berkeley, USA;Lawrence Berkeley National Laboratory, Berkeley, USA;Lawrence Berkeley National Laboratory, Berkeley, USA

  • Venue:
  • Proceedings of the first workshop on Changing landscapes in HPC security
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We aim here to leverage supervised learning to enable large-scale analysis of performance logs, in order to accurately classify code runs and understand the importance of different performance metrics. Previous work has demonstrated structured communication patterns in high performance codes. By categorizing these patterns, we can identify what code was executed. The ability to identify a code by its performance profile is useful for specializing HPC security systems and for identifying common optimizations for similar codes. Supervised machine learning is used on an extensive set of data of real user runs from a high performance computing center. We employ and modify a rule ensemble method to predict what code was run given a performance log. This naive method achieves greater than 93% accuracy. When modified to allow an "other class," accuracy increases to greater than 97%. This modification allows an anomalous run to be flagged as not belonging to a previously seen, or acceptable, code and provides additional latitude in monitoring what is run on supercomputing facilities. We conclude by interpreting the resulting rule model, as it tells us which components of a code are most distinctive and useful for identification.