Benchmarking the effects of operating system interference on extreme-scale parallel machines

  • Authors:
  • Pete Beckman;Kamil Iskra;Kazutomo Yoshii;Susan Coghlan;Aroon Nataraj

  • Affiliations:
  • Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, USA 60439;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, USA 60439;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, USA 60439;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, USA 60439;Department of Computer and Information Science, University of Oregon, Eugene, USA 97403

  • Venue:
  • Cluster Computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate operating system noise, which we identify as one of the main reasons for a lack of synchronicity in parallel applications. Using a microbenchmark, we measure the noise on several contemporary platforms and find that, even with a general-purpose operating system, noise can be limited if certain precautions are taken. We then inject artificially generated noise into a massively parallel system and measure its influence on the performance of collective operations. Our experiments indicate that on extreme-scale platforms, the performance is correlated with the largest interruption to the application, even if the probability of such an interruption on a single process is extremely small. We demonstrate that synchronizing the noise can significantly reduce its negative influence.