Effects of Parallelism Degree on Run-Time Parallelization of Loops

  • Authors:
  • Chengzhong Xu

  • Affiliations:
  • -

  • Venue:
  • HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences-Volume 7 - Volume 7
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

Due to the overhead for exploiting and managing parallelism, run-time loop parallelization techniques with the aim of maximizing parallelism may not necessarily lead to the best performance. In this paper, we present two parallelization techniques that exploit different degrees of parallelism for loops with dynamic cross-iteration dependences. The DOALL approach exploits iteration-level parallelism. It restructures the loop into a sequence of do-parallel loops, separated by barrier operations. Iterations of a do-parallel loop are run in parallel. By contrast, the DOACROSS approach exposes fine-grained reference-level parallelism. It allows dependent iterations to be run concurrently by inserting point-to-point synchronization operations to preserve dependences. The DOACROSS approach has variants that identify different amounts of parallelism among consecutive reads to the same memory location. We evaluate the algorithms for loops using various structures, memory access patterns, and computational workloads on symmetric multiprocessors. The algorithms are scheduled using block cyclic decomposition strategies. The experimental results show that the DOACROSS technique outperforms the DOALL, even though the latter is widely used in compile-time parallelization of loops. Of the DOACROSS variants, the algorithm allowing partially concurrent reads performs best because it incurs only slightly more overhead than the algorithm disallowing concurrent reads. The benefit from allowing fully concurrent reads is significant for small loops that do not have enough parallelism. However, it is likely to be outweighed by its cost for large loops or loops with light workload.