Operating system issues for petascale systems

Authors:
Pete Beckman;Kamil Iskra;Kazutomo Yoshii;Susan Coghlan
Affiliations:
Argonne National Laboratory, Argonne, IL;Argonne National Laboratory, Argonne, IL;Argonne National Laboratory, Argonne, IL;Argonne National Laboratory, Argonne, IL
Venue:
ACM SIGOPS Operating Systems Review
Year:
2006

Citing 6
Cited 11

GRAPE-4: a one-Tflops special-purpose computer for astrophysical N-body problem

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
The Soft Error Problem: An Architectural Perspective

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
The impact of noise on the scaling of collectives: a theoretical approach

HiPC'05 Proceedings of the 12th international conference on High Performance Computing

ZOID: I/O-forwarding infrastructure for petascale architectures

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Benchmarking the effects of operating system interference on extreme-scale parallel machines

Cluster Computing
DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Implementation and performance analysis of non-blocking collective operations for MPI

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Evaluating the effect of replacing CNK with linux on the compute-nodes of blue gene/l

Proceedings of the 22nd annual international conference on Supercomputing
jitSim: a simulator for predicting scalability of parallel applications in presence of OS jitter

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Performance and Scalability Evaluation of 'Big Memory' on Blue Gene Linux

International Journal of High Performance Computing Applications
Extending and benchmarking the "Big Memory" implementation on Blue Gene/P Linux

Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Visual analysis of I/O system behavior for high-end computing

Proceedings of the third international workshop on Large-scale system and application performance
Better than native: using virtualization to improve compute node performance

Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Improving compute node performance using virtualization

International Journal of High Performance Computing Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Petascale supercomputers will be available by 2008. The largest machine of these complex leadership-class machines will probably have nearly 250K CPUs. These massively parallel systems have a number of challenging operating system issues. In this paper, we focus on the issues most important for the system that will first breach the petaflop barrier: synchronization and collective operations, parallel I/O, and fault tolerance.