The impact of system design parameters on application noise sensitivity

  • Authors:
  • Kurt B. Ferreira;Patrick G. Bridges;Ron Brightwell;Kevin T. Pedretti

  • Affiliations:
  • Scalable System Software Department, Sandia National Laboratories, Albuquerque, USA 87185-1319 and Computer Science Department, The University of New Mexico, Albuquerque, USA 87131;Computer Science Department, The University of New Mexico, Albuquerque, USA 87131;Scalable System Software Department, Sandia National Laboratories, Albuquerque, USA 87185-1319;Scalable System Software Department, Sandia National Laboratories, Albuquerque, USA 87185-1319

  • Venue:
  • Cluster Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Operating system (OS) noise, or jitter, is a key limiter of application scalability in high end computing systems. Several studies have attempted to quantify the sources and effects of system interference, though few of these studies show the influence that architectural and system characteristics have on the impact of noise at scale. In this paper, we examine the impact of three such system properties: platform balance, noisy node distribution, and the choice of collective algorithm. Using a previously-developed noise injection tool, we explore how the impact of noise varies with these platform characteristics. We provide detailed performance results that indicate that a system with relatively less network bandwidth is able to absorb more noise than a system with more network bandwidth. Our results also show that application performance can be significantly degraded by only a subset of noisy nodes. Furthermore, the placement of the noisy nodes is also important, especially for applications that make substantial use of tree-based collective communication operations. Lastly, performance results indicate that non-blocking collective operations have the ability to greatly mitigate the impact of OS interference. When combined, these results show that the impact of OS noise is not solely a property of application communication behavior, but is also influenced by other properties of the system architecture and system software environment.