GPU computing and the road to extreme-scale parallel systems

Authors:
Stephen W. Keckler
Affiliations:
The University of Texas at Austin, NVIDIA, USA
Venue:
IISWC '11 Proceedings of the 2011 IEEE International Symposium on Workload Characterization
Year:
2011

Citing 0
Cited 1

Optimal placement of vertical connections in 3D Network-on-Chip

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

While Moore's Law has continued to provide smaller semiconductor devices, the effective end of uniprocessor performance scaling has (finally) instigated mainstream computing to adopt parallel hardware and software. Based on their derivation from high-performance programmable graphics architectures, modern GPUs have emerged as the world's most successful parallel architecture. Today, a single GPU has a peak performance of over 650 GFlops and 175 GBytes/second of memory bandwidth. The combination of high compute density and energy efficiency (GFlops/Watt) has motivated the world's fastest supercomputers to employ GPUs, including 3 of the top 5 on the June 2011 Top 500 list. This presentation will first describe the fundamentals of contemporary GPU architectures and the high-performance systems that are built around them. I will then highlight three substantial challenges that face the design of future parallel computing systems on the road to Exascale: (1) the power wall, (2) the bandwidth wall, and (3) the programming wall. Finally, I will describe NVIDIA's Echelon research project that is developing architectures and programming systems that aim to address these challenges and drive continued performance scaling of parallel computing from embedded systems to supercomputers.