Manycore performance-portability: Kokkos multidimensional array library

  • Authors:
  • H. Carter Edwards;Daniel Sunderland;Vicki Porter;Chris Amsler;Sam Mish

  • Affiliations:
  • Computing Research Center, Sandia National Laboratories, Livermore, CA, USA;Engineering Sciences Center, Sandia National Laboratories, Albuquerque, NM, USA;Engineering Sciences Center, Sandia National Laboratories, Albuquerque, NM, USA;Department of Electrical and Computer Engineering, Kansas State University, Manhattan, KS, USA;Department of Mathematics, California State University, Los Angeles, CA, USA

  • Venue:
  • Scientific Programming - A New Overview of the Trilinos Project --Part 1
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern manycore accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces APIs, and performance requirements. The Kokkos Array programming model provides library-based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: 1 manycore compute devices each with its own memory space, 2 data parallel kernels and 3 multidimensional arrays. Kernel execution performance is, especially for NVIDIA® devices, extremely dependent on data access patterns. Optimal data access pattern can be different for different manycore devices --potentially leading to different implementations of computational kernels specialized for different devices. The Kokkos Array programming model supports performance-portable kernels by 1 separating data access patterns from computational kernels through a multidimensional array API and 2 introduce device-specific data access mappings when a kernel is compiled. An implementation of Kokkos Array is available through Trilinos [Trilinos website, http://trilinos.sandia.gov/, August 2011].