Highly efficient gang scheduling implementation
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
STORM: lightning-fast resource management
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Performance variability of highly parallel architectures
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Operating system issues for petascale systems
ACM SIGOPS Operating Systems Review
The Impact of noise on the scaling of collectives: the nearest neighbor model
HiPC'07 Proceedings of the 14th international conference on High performance computing
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Self-similarity of parallel machines
Parallel Computing
Linux kernel co-scheduling for bulk synchronous parallel applications
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Extending and benchmarking the "Big Memory" implementation on Blue Gene/P Linux
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Impact of noise on scaling of collectives: an empirical evaluation
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
A case for non-blocking collective operations
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Linux kernel co-scheduling and bulk synchronous parallelism
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
The performance of parallel applications running on large clusters is known to degrade due to the interference of kernel and daemon activities on individual nodes, often referred to as noise. In this paper, we focus on an important class of parallel applications, which repeatedly perform computation, followed by a collective operation such as a barrier. We model this theoretically and demonstrate, in a rigorous way, the effect of noise on the scalability of such applications. We study three natural and important classes of noise distributions: The exponential distribution, the heavy-tailed distribution, and the Bernoulli distribution. We show that the systems scale well in the presence of exponential noise, but the performance goes down drastically in the presence of heavy-tailed or Bernoulli noise.