Manycore performance-portability: Kokkos multidimensional array library

Authors:
H. Carter Edwards;Daniel Sunderland;Vicki Porter;Chris Amsler;Sam Mish
Affiliations:
Computing Research Center, Sandia National Laboratories, Livermore, CA, USA;Engineering Sciences Center, Sandia National Laboratories, Albuquerque, NM, USA;Engineering Sciences Center, Sandia National Laboratories, Albuquerque, NM, USA;Department of Electrical and Computer Engineering, Kansas State University, Manhattan, KS, USA;Department of Mathematics, California State University, Los Angeles, CA, USA
Venue:
Scientific Programming - A New Overview of the Trilinos Project --Part 1
Year:
2012

Citing 5
Cited 0

Arrays in Blitz++

ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond (C++ in Depth Series)

C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond (C++ in Depth Series)
Intel threading building blocks

Intel threading building blocks
The NumPy Array: A Structure for Efficient Numerical Computation

Computing in Science and Engineering
GPU Computing Gems Jade Edition

GPU Computing Gems Jade Edition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern manycore accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces APIs, and performance requirements. The Kokkos Array programming model provides library-based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: 1 manycore compute devices each with its own memory space, 2 data parallel kernels and 3 multidimensional arrays. Kernel execution performance is, especially for NVIDIA® devices, extremely dependent on data access patterns. Optimal data access pattern can be different for different manycore devices --potentially leading to different implementations of computational kernels specialized for different devices. The Kokkos Array programming model supports performance-portable kernels by 1 separating data access patterns from computational kernels through a multidimensional array API and 2 introduce device-specific data access mappings when a kernel is compiled. An implementation of Kokkos Array is available through Trilinos [Trilinos website, http://trilinos.sandia.gov/, August 2011].