Combining Loop Fusion with Prefetching on Shared-memory Multiprocessors

  • Authors:
  • Naraig Manjikia

  • Affiliations:
  • -

  • Venue:
  • ICPP '97 Proceedings of the international Conference on Parallel Processing
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

The performance of programs consisting of parallel loops on shared-memory multiprocessors is limited by long memory latencies as processor speeds increase more rapidly than memory speeds. Two complementary techniques for addressing memory latency and improving performance are: (a) cache locality enhancement for latency reduction and (b) data prefetching for latency tolerance. This paper studies the benefit of combining loop fusion for locality enhancement with prefetching. Experimental results are reported for multiprocessors with support for prefetching. For a complete application on an SGI Power Challenge R10000, combining loop fusion with prefetching improves parallel speedup by 46%.