Characterizing application sensitivity to OS interference using kernel-level noise injection
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Investigating virtual passthrough I/O on commodity devices
ACM SIGOPS Operating Systems Review
Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
VM-based slack emulation of large-scale systems
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Improving per-node efficiency in the datacenter with new OS abstractions
Proceedings of the 2nd ACM Symposium on Cloud Computing
Virtual-machine-based emulation of future generation high-performance computing systems
International Journal of High Performance Computing Applications
Hobbes: composition and virtualization as the foundations of an extreme-scale OS/R
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Scientific Programming - A New Overview of the Trilinos Project --Part 1
Hi-index | 0.00 |
In the early 1990s, researchers at Sandia National Laboratories and the University of New Mexico began development of customized system software for massively parallel ‘capability’ computing platforms. These lightweight kernels have proven to be essential for delivering the full power of the underlying hardware to applications. This claim is underscored by the success of several supercomputers, including the Intel Paragon, Intel Accelerated Strategic Computing Initiative Red, and the Cray XT series of systems, each having established a new standard for high-performance computing upon introduction. In this paper, we describe our approach to lightweight compute node kernel design and discuss the design principles that have guided several generations of implementation and deployment. A broad strategy of operating system specialization has led to a focus on user-level resource management, deterministic behavior, and scalable system services. The relative importance of each of these areas has changed over the years in response to changes in applications and hardware and system architecture. We detail our approach and the associated principles, describe how our application of these principles has changed over time, and provide design and performance comparisons to contemporaneous supercomputing operating systems. Copyright © 2008 John Wiley & Sons, Ltd.