Performance Evaluation of OpenMP Applications with Nested Parallelism

  • Authors:
  • Yoshizumi Tanaka;Kenjiro Taura;Mitsuhisa Sato;Akinori Yonezawa

  • Affiliations:
  • -;-;-;-

  • Venue:
  • LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many existing OpenMP systems do not sufficiently implement nested parallelism. This is supposedly because nested parallelism is believed to require a significant implementation effort, incur a large overhead, or lack applications. This paper demonstrates Omni/ST, a simple and efficient implementation of OpenMP nested parallelism using StackThreads/MP, which is a fine-grain thread library. Thanks to StackThreads/MP, OpenMP parallel constructs are simply mapped onto thread creation primitives of StackThreads/MP, yet they are efficiently managed with a fixed number of threads in the underlying thread package (e.g., Pthreads). Experimental results on Sun Ultra Enterprise 10000 with up to 60 processors show that overhead imposed by nested parallelism is very small (1-3% in five out of six applications, and 8% for the other), and there is a significant scalability benefit for applications with nested parallelism.