Algorithm 235: Random permutation
Communications of the ACM
Predictive performance and scalability modeling of a large-scale application
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Cross-Operating System Process Migration on a Massively Parallel Processor
Cross-Operating System Process Migration on a Massively Parallel Processor
Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Optimization of MPI collective communication on BlueGene/L systems
Proceedings of the 19th annual international conference on Supercomputing
A Performance Model of the Parallel Ocean Program
International Journal of High Performance Computing Applications
An Analysis of System Balance Requirements for Scientific Applications
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Designing a highly-scalable operating system: the Blue Gene/L story
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Performance analysis of MPI collective operations
Cluster Computing
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Implementation and performance analysis of non-blocking collective operations for MPI
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Characterizing application sensitivity to OS interference using kernel-level noise injection
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Handling OS jitter on multicore multithreaded systems
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Scalability challenges for massively parallel AMR applications
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Hierarchical Collectives in MPICH2
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
LogGOPSim: simulating large-scale applications in the LogGOPS model
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Operating system (OS) noise, or jitter, is a key limiter of application scalability in high end computing systems. Several studies have attempted to quantify the sources and effects of system interference, though few of these studies show the influence that architectural and system characteristics have on the impact of noise at scale. In this paper, we examine the impact of three such system properties: platform balance, noisy node distribution, and the choice of collective algorithm. Using a previously-developed noise injection tool, we explore how the impact of noise varies with these platform characteristics. We provide detailed performance results that indicate that a system with relatively less network bandwidth is able to absorb more noise than a system with more network bandwidth. Our results also show that application performance can be significantly degraded by only a subset of noisy nodes. Furthermore, the placement of the noisy nodes is also important, especially for applications that make substantial use of tree-based collective communication operations. Lastly, performance results indicate that non-blocking collective operations have the ability to greatly mitigate the impact of OS interference. When combined, these results show that the impact of OS noise is not solely a property of application communication behavior, but is also influenced by other properties of the system architecture and system software environment.