Fault oblivious eXascale whitepaper

Authors:
Ronald G. Minnich;Curtis L. Janssen;Sriram Krishnamoorthy;Andres Marquez;Maya Gokhale;P. Sadayappan;Eric Van Hensbergen;Jim McKie;Jonathan Appavoo
Affiliations:
Sandia National Laboratories;Sandia National Laboratories;Pacific Northwest National Laboratory;Pacific Northwest National Laboratory;Lawrence Livermore National Laboratory;Ohio State University;IBM Research;Alcatel-Lucent Bell-Labs;Boston University
Venue:
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Year:
2011

Citing 4
Cited 0

Hypergraph partitioning for automatic memory hierarchy management

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model

CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Exascale computing systems will provide a thousand-fold increase in parallelism and a proportional increase in failure rate relative to today's machines[3]. Future systems are expected to feature billions of threads and 10s of millions of CPUs. The nodes and networks of these systems will be hierarchical, and ignoring this hardware hierarchy will lead to poor utilization. Failure will be a constant companion, and it is unlikely that checkpointing the entire system, with its petabytes of memory, will be practical. Systems software for exascale machines must provide the infrastructure to support existing applications while simultaneously enabling efficient execution of new programming models that naturally express dynamic, adaptive, irregular computation; coupled simulations; and massive data analysis.