Optimal placement of vertical connections in 3D Network-on-Chip
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
While Moore's Law has continued to provide smaller semiconductor devices, the effective end of uniprocessor performance scaling has (finally) instigated mainstream computing to adopt parallel hardware and software. Based on their derivation from high-performance programmable graphics architectures, modern GPUs have emerged as the world's most successful parallel architecture. Today, a single GPU has a peak performance of over 650 GFlops and 175 GBytes/second of memory bandwidth. The combination of high compute density and energy efficiency (GFlops/Watt) has motivated the world's fastest supercomputers to employ GPUs, including 3 of the top 5 on the June 2011 Top 500 list. This presentation will first describe the fundamentals of contemporary GPU architectures and the high-performance systems that are built around them. I will then highlight three substantial challenges that face the design of future parallel computing systems on the road to Exascale: (1) the power wall, (2) the bandwidth wall, and (3) the programming wall. Finally, I will describe NVIDIA's Echelon research project that is developing architectures and programming systems that aim to address these challenges and drive continued performance scaling of parallel computing from embedded systems to supercomputers.