Parallel memory prediction for fused linear algebra kernels

  • Authors:
  • Ian Karlin;Elizabeth Jessup;Geoffrey Belter;Jeremy G. Siek

  • Affiliations:
  • University of Colorado, Boulder;University of Colorado, Boulder;University of Colorado, Boulder;University of Colorado, Boulder

  • Venue:
  • ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The performance of many scientific programs is limited by data movement. Loop fusion is one optimization used to increase the speed of memory bound operations. To automate loop fusion for matrix computations, we developed the Build to Order (BTO) compiler. Within BTO, an analytic memory model efficiently and accurately reduces the number of serial loop fusion options considered. In this paper, we extend the model to shared memory parallel machines. We detail the differences between parallel and serial memory use and runtime prediction and explain the changes made to include parallel machines in the model. Analysis of the parallel model's predictions show that when it is included in BTO it will reduce the search space of considered routines.