A simplified TVD finite difference sheme via artificial viscousity
SIAM Journal on Scientific and Statistical Computing - Papers from the Second Conference on Parallel Processing for Scientific Computin
0.374 Pflop/s trillion-particle kinetic modeling of laser plasma interaction on Roadrunner
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Linux Journal
POWER4 system microarchitecture
IBM Journal of Research and Development
A Hybrid Programming Model for Compressible Gas Dynamics Using OpenCL
ICPPW '10 Proceedings of the 2010 39th International Conference on Parallel Processing Workshops
Automating topology aware mapping for supercomputers
Automating topology aware mapping for supercomputers
Liszt: a domain specific language for building portable mesh-based PDE solvers
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
RGEM: A Responsive GPGPU Execution Model for Runtime Engines
RTSS '11 Proceedings of the 2011 IEEE 32nd Real-Time Systems Symposium
Gdev: first-class GPU resource management in the operating system
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
ForOpenCL: transformations exploiting array syntax in Fortran for accelerator programming
International Journal of Computational Science and Engineering
Hi-index | 0.00 |
As the next generation of supercomputers reaches the exascale, the dominant design parameter governing performance will shift from hardware to software. Intelligent usage of memory access, vectorization, and intranode threading will become critical to the performance of scientific applications and numerical calculations on exascale supercomputers. Although challenges remain in effectively programming the heterogeneous devices likely to be utilized in future supercomputers, new languages and tools are providing a pathway for application developers to tackle this new frontier. These languages include open programming standards such as OpenCL and OpenACC, as well as widely-adopted languages such as CUDA; also of importance are high-quality libraries such as CUDPP and Thrust. This article surveys a purposely diverse set of proof-of-concept applications developed at Los Alamos National Laboratory. We find that the capability level of the accelerator computing hardware and languages has moved beyond the regular grid finite difference calculations and molecular dynamics codes. More advanced applications requiring dynamic memory allocation, such as cell-based adaptive mesh refinement, can now be addressed--and with more effort even unstructured mesh codes can be moved to the GPU.