Local Memory Design Space Exploration for High-Performance Computing

  • Authors:
  • Ramon Bertran;Marc Gonzàlez;Xavier Martorell;Nacho Navarro;Eduard Ayguadé

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • The Computer Journal
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The performance of high-performance computing (HPC) applications highly depends on the memory subsystem due to the huge data sets used that do not fit into the cache hierarchy. Besides, energy efficiency has become a main design factor and, consequently, both performance and energy efficiency are primary goals in HPC designs. As a result, energy-efficient high-performance memory subsystem designs should be explored. In this paper, we extend the architecture of general-purpose processors by adding a software-managed local memory (LM) and a very simple programmable DMA controller. We demonstrate that with these extensions—together with efficient run-time management—we improve performance and energy consumption factors. We perform an LM design space exploration study for an Intel® Pentium® 4 platform: we analyze the performance, energy and energy-delay product for a total of 27 computational loops of the NAS benchmarks. We show a 1.2x performance speedup factor and an energy reduction of 6.21% on average when using a constrained 32脗聽KB LM with commodity memory bandwidths (6.4脗聽GB/s). More aggressive configurations (i.e. 256脗聽KB LM + 12.8 GB/s) show at least 2.14x performance speedup factors and energy savings of 42.07% on average.