Implicit and explicit optimizations for stencil computations

  • Authors:
  • Shoaib Kamil;Kaushik Datta;Samuel Williams;Leonid Oliker;John Shalf;Katherine Yelick

  • Affiliations:
  • Lawrence Berkeley National Laboratory, Berkeley, CA;University of California, Berkeley, CA;University of California, Berkeley, CA;Lawrence Berkeley National Laboratory, Berkeley, CA;Lawrence Berkeley National Laboratory, Berkeley, CA;Lawrence Berkeley National Laboratory, Berkeley, CA and University of California, Berkeley, CA

  • Venue:
  • Proceedings of the 2006 workshop on Memory system performance and correctness
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Stencil-based kernels constitute the core of many scientific applications on block-structured grids. Unfortunately, these codes achieve a low fraction of peak performance, due primarily to the disparity between processor and main memory speeds. We examine several optimizations on both the conventional cache-based memory systems of the Itanium 2, Opteron, and Power5, as well as the heterogeneous multicore design of the Cell processor. The optimizations target cache reuse across stencil sweeps, including both an implicit cache oblivious approach and a cache-aware algorithm blocked to match the cache structure. Finally, we consider stencil computations on a machine with an explicitly-managed memory hierarchy, the Cell processor. Overall, results show that a cache-aware approach is significantly faster than a cache oblivious approach and that the explicitly managed memory on Cell is more efficient: Relative to the Power5, it has almost 2x more memory bandwidth and is 3.7x faster.