Hardware-software coherence protocol for the coexistence of caches and local memories
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
The performance of high-performance computing (HPC) applications highly depends on the memory subsystem due to the huge data sets used that do not fit into the cache hierarchy. Besides, energy efficiency has become a main design factor and, consequently, both performance and energy efficiency are primary goals in HPC designs. As a result, energy-efficient high-performance memory subsystem designs should be explored. In this paper, we extend the architecture of general-purpose processors by adding a software-managed local memory (LM) and a very simple programmable DMA controller. We demonstrate that with these extensions—together with efficient run-time management—we improve performance and energy consumption factors. We perform an LM design space exploration study for an Intel® Pentium® 4 platform: we analyze the performance, energy and energy-delay product for a total of 27 computational loops of the NAS benchmarks. We show a 1.2x performance speedup factor and an energy reduction of 6.21% on average when using a constrained 32脗聽KB LM with commodity memory bandwidths (6.4脗聽GB/s). More aggressive configurations (i.e. 256脗聽KB LM + 12.8 GB/s) show at least 2.14x performance speedup factors and energy savings of 42.07% on average.