Exascale computing technology challenges

Authors:
John Shalf;Sudip Dosanjh;John Morrison
Affiliations:
NERSC Division, Lawrence Berkeley National Laboratory, Berkeley, California;Sandia National Laboratories, New Mexico;Los Alamos National Laboratory, Los Alamos, New Mexico
Venue:
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Year:
2010

Citing 9
Cited 17

Limit to the bit-rate capacity of electrical interconnects from the aspect ratio of the system architecture

Journal of Parallel and Distributed Computing - Special issue on parallel computing with optical interconnects
High-Speed Electrical Signaling: Overview and Limitations

IEEE Micro
Design tradeoffs for tiled CMP on-chip networks

Proceedings of the 20th annual international conference on Supercomputing
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
Cost-Efficient Dragonfly Topology for Large-Scale Systems

IEEE Micro
A survey of the research on power management techniques for high-performance systems

Software—Practice & Experience
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Silicon Nanophotonic Network-on-Chip Using TDM Arbitration

HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
Communication Requirements and Interconnect Optimization for High-End Scientific Applications

IEEE Transactions on Parallel and Distributed Systems

Achieving exascale computing through hardware/software co-design

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Virtual I/O caching: dynamic storage cache management for concurrent workloads

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Towards scalable I/O architecture for exascale systems

Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
Efficient SIMD code generation for irregular kernels

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
In-situ I/O processing: a case for location flexibility

Proceedings of the sixth workshop on Parallel Data Storage
Towards a codelet-based runtime for exascale computing: position paper

Proceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Energy-guided exploration of on-chip network design for exa-scale computing

Proceedings of the International Workshop on System Level Interconnect Prediction
NUMA-aware graph mining techniques for performance and energy efficiency

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Communication-avoiding parallel strassen: implementation and performance

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications

Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
Moving from petaflops to petadata

Communications of the ACM
Memory-conscious collective I/O for extreme scale HPC systems

Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems

The Journal of Supercomputing
A communications simulation methodology for AMR codes using task dependency analysis

IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential

ACM Transactions on Architecture and Code Optimization (TACO)
Turbine: A Distributed-memory Dataflow Engine for High Performance Many-task Applications

Fundamenta Informaticae - Scalable Workflow Enactment Engines and Technology
Interactive ray casting of geodesic grids

EuroVis '13 Proceedings of the 15th Eurographics Conference on Visualization

Quantified Score

Hi-index	0.02

Visualization

Abstract

High Performance Computing architectures are expected to change dramatically in the next decade as power and cooling constraints limit increases in microprocessor clock speeds. Consequently computer companies are dramatically increasing on-chip parallelism to improve performance. The traditional doubling of clock speeds every 18-24 months is being replaced by a doubling of cores or other parallelism mechanisms. During the next decade the amount of parallelism on a single microprocessor will rival the number of nodes in early massively parallel supercomputers that were built in the 1980s. Applications and algorithms will need to change and adapt as node architectures evolve. In particular, they will need to manage locality to achieve performance. A key element of the strategy as we move forward is the co-design of applications, architectures and programming environments. There is an unprecedented opportunity for application and algorithm developers to influence the direction of future architectures so that they meet DOE mission needs. This article will describe the technology challenges on the road to exascale, their underlying causes, and their effect on the future of HPC system design.