Run-time scheduling and execution of loops on message passing machines
Journal of Parallel and Distributed Computing - Special issue: algorithms for hypercube computers
Hi-index | 0.00 |
The lack of performance portability has been disheartening scientific application users to develop portable programs written in HPF. As the users would like to run the same source code on different parallel machines as fast as possible, we have investigated the performance portability for Japanese HPF compilers (NEC and Fujitsu) with a special benchmark suite. We got good performance in most cases with DISTRIBUTE and INDEPENDENT directives on NEC SX-5, but Fujitsu VPP800 required to explicitly force no communication inside parallel loops with additional LOCAL directives. It was also found that manual optimizations for communication with HPF/JA extensions were very useful to tune parallel performance.