Designing and implementing lightweight kernels for capability computing

Authors:
Rolf Riesen;Ron Brightwell;Patrick G. Bridges;Trammell Hudson;Arthur B. Maccabe;Patrick M. Widener;Kurt Ferreira
Affiliations:
Sandia National Laboratories, Albuquerque, NM 87185, U.S.A.;Sandia National Laboratories, Albuquerque, NM 87185, U.S.A.;Department of Computer Science, University of New Mexico, Albuquerque, NM 87131-1386, U.S.A.;OS Research, 1527 16th St. NW #5, Washington, DC 20036, U.S.A.;Department of Computer Science, University of New Mexico, Albuquerque, NM 87131-1386, U.S.A.;Department of Computer Science, University of New Mexico, Albuquerque, NM 87131-1386, U.S.A.;Sandia National Laboratories, Albuquerque, NM 87185, U.S.A.
Venue:
Concurrency and Computation: Practice & Experience
Year:
2009

Citing 0
Cited 8

Characterizing application sensitivity to OS interference using kernel-level noise injection

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Investigating virtual passthrough I/O on commodity devices

ACM SIGOPS Operating Systems Review
Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
VM-based slack emulation of large-scale systems

Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Improving per-node efficiency in the datacenter with new OS abstractions

Proceedings of the 2nd ACM Symposium on Cloud Computing
Virtual-machine-based emulation of future generation high-performance computing systems

International Journal of High Performance Computing Applications
Hobbes: composition and virtualization as the foundations of an extreme-scale OS/R

Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Trilinos I/O Support Trios

Scientific Programming - A New Overview of the Trilinos Project --Part 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the early 1990s, researchers at Sandia National Laboratories and the University of New Mexico began development of customized system software for massively parallel ‘capability’ computing platforms. These lightweight kernels have proven to be essential for delivering the full power of the underlying hardware to applications. This claim is underscored by the success of several supercomputers, including the Intel Paragon, Intel Accelerated Strategic Computing Initiative Red, and the Cray XT series of systems, each having established a new standard for high-performance computing upon introduction. In this paper, we describe our approach to lightweight compute node kernel design and discuss the design principles that have guided several generations of implementation and deployment. A broad strategy of operating system specialization has led to a focus on user-level resource management, deterministic behavior, and scalable system services. The relative importance of each of these areas has changed over the years in response to changes in applications and hardware and system architecture. We detail our approach and the associated principles, describe how our application of these principles has changed over time, and provide design and performance comparisons to contemporaneous supercomputing operating systems. Copyright © 2008 John Wiley & Sons, Ltd.