Kokkos Array performance-portable manycore programming model

  • Authors:
  • H. Carter Edwards;Daniel Sunderland

  • Affiliations:
  • Sandia National Laboratories, Albuquerque, NM;Sandia National Laboratories, Albuquerque, NM

  • Venue:
  • Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large, complex scientific and engineering application code have a significant investment in computational kernels which implement their mathematical models. Porting these computational kernels to multicore-CPU and manycore-accelerator (e.g., NVIDIA® GPU) devices is a major challenge given the diverse programming models, application programming interfaces (APIs), and performance requirements. The Kokkos Array programming model provides library-based approach for implementing computational kernels that are performance-portable to multicore-CPU and manycore-accelerator devices. This programming model is based upon three fundamental concepts: (1) manycore compute devices each with its own memory space, (2) data parallel computational kernels, and (3) multidimensional arrays. Performance-portability is achieved by decoupling computational kernels from device-specific data access performance requirements (e.g., NVIDIA coalesced memory access) through an intuitive multidimensional array API. The Kokkos Array API uses C++ template meta-programming to, at compile time, transparently insert device-optimal data access maps into computational kernels. With this programming model computational kernels can be written once and, without modification, performance-portably compiled to multicore-CPU and manycore-accelerator devices.