Understanding and isolating the noise in the Linux kernel

Authors:
Hakan Akkan;Michael Lang;Lorie Liebrock
Affiliations:
New Mexico Consortium, Los Alamos, NM, USA;Los Alamos National Laboratory, Los Alamos, NM, USA;New Mexico Institute of Mining and Technology, Socorro, NM, USA
Venue:
International Journal of High Performance Computing Applications
Year:
2013

Citing 9
Cited 0

Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
System noise, OS clock ticks, and fine-grained parallel applications

Proceedings of the 19th annual international conference on Supercomputing
Characterizing application sensitivity to OS interference using kernel-level noise injection

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
The multikernel: a new OS architecture for scalable multicore systems

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Tessellation: space-time partitioning in a manycore client OS

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Characterizing the impact of using spare-cores on application performance

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
A Quantitative Analysis of OS Noise

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scientific applications are interrupted by the operating system far too often. Historically, operating systems have been optimized to time-share a single resource, the CPU. We now have an abundance of cores, but we are still swapping out the application to run other tasks and therefore increasing the application's time to solution. In addition, with parallel applications the probability of one of the tasks entering a synchronization point late due to one of these interrupts increases with increasing system scale, which further increases the application turn-around time. This paper reviews measures that can be taken to reduce application interruption using only compile and run time configurations in a recent unmodified Linux kernel. Although these measures have been available for some time, to the best of the authors' knowledge, they have never been implemented in a high-performance computing context. We then introduce our invasive method, where we remove the involuntary preemption induced by task scheduling. Our experiments show that parallel applications benefit from these modifications even at relatively small scales. At the modest scale of our testbed, we see a 1.91% improvement in a bulk-synchronous-parallel application that should project into higher benefits at extreme scales.