Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
System noise, OS clock ticks, and fine-grained parallel applications
Proceedings of the 19th annual international conference on Supercomputing
Analysis of microbenchmarks for performance tuning of clusters
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Operating system issues for petascale systems
ACM SIGOPS Operating Systems Review
Network performance impact of a lightweight Linux for Cray XT3 compute nodes
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Blue Gene/L programming and operating environment
IBM Journal of Research and Development
Performance variability of highly parallel architectures
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Impact of noise on scaling of collectives: an empirical evaluation
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
The impact of noise on the scaling of collectives: a theoretical approach
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
ZOID: I/O-forwarding infrastructure for petascale architectures
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Scalable Time Warp on Blue Gene Supercomputers
PADS '09 Proceedings of the 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation
Providing a cloud network infrastructure on a supercomputer
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Performance and Scalability Evaluation of 'Big Memory' on Blue Gene Linux
International Journal of High Performance Computing Applications
A light-weight virtual machine monitor for Blue Gene/P
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Thread Tranquilizer: Dynamically reducing performance variation
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
A lightweight virtual machine monitor for Blue Gene/P
International Journal of High Performance Computing Applications
On deciding between conservative and optimistic approaches on massively parallel platforms
Proceedings of the Winter Simulation Conference
Warp speed: executing time warp on 1,966,080 cores
Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
Hi-index | 0.00 |
We investigate operating system noise, which we identify as one of the main reasons for a lack of synchronicity in parallel applications. Using a microbenchmark, we measure the noise on several contemporary platforms and find that, even with a general-purpose operating system, noise can be limited if certain precautions are taken. We then inject artificially generated noise into a massively parallel system and measure its influence on the performance of collective operations. Our experiments indicate that on extreme-scale platforms, the performance is correlated with the largest interruption to the application, even if the probability of such an interruption on a single process is extremely small. We demonstrate that synchronizing the noise can significantly reduce its negative influence.