The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor

  • Authors:
  • D. J. Lilja

  • Affiliations:
  • -

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

Trace-driven simulations of numerical Fortran programs are used to study the impact ofthe parallel loop scheduling strategy on data prefetching in a shared memorymultiprocessor with private data caches. The simulations indicate that to maximizememory performance, it is important to schedule blocks of consecutive iterations toexecute on each processor, and then to adaptively prefetch single-word cache blocks tomatch the number of iterations scheduled. Prefetching multiple single-word cache blockson a miss reduces the miss ratio by approximately 5% to 30% compared to a system withno prefetching. In addition, the proposed adaptive prefetching scheme further reducesthe miss ratio while significantly reducing the false sharing among cache blocks compared to nonadaptive prefetching strategies. Reducing the false sharing causes fewer coherence invalidations to be generated, and thereby reduces the total network traffic. The impact of the prefetching and scheduling strategies on the temporal distribution ofcoherence invalidations also is examined. It is found that invalidations tend to be evenlydistributed throughout the execution of parallel loops, but tend to be clustered whenexecuting sequential program sections. The distribution of invalidations in both types of program sections is relatively insensitive to the prefetching and scheduling strategy.