A coordination language for mixed task and and data parallel programs
Proceedings of the 1999 ACM symposium on Applied computing
Is data distribution necessary in OpenMP?
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Extending OpenMP for NUMA machines
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Complex Pipelined Executions in OpenMP Parallel Applications
ICPP '02 Proceedings of the 2001 International Conference on Parallel Processing
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Parallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP
Towards OpenMP Execution on Software Distributed Shared Memory Systems
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Performance of OSCAR multigrain parallelizing compiler on SMP servers
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Hi-index | 0.00 |
The characteristics of large-scale parallel applications are multi-paradigm and multi-grain parallel in essence. The key factor in improving the performance of parallel application systems is to determine suitable parallel paradigms and grains according to the nature of the practical problem. Therefore, it is necessary to provide multi-paradigm and multi-grain parallel programming interface for development of large-scale parallel application systems. This paper proposes a multi-paradigm and multi-grain parallel execution model integrated coarse-grain parallelism (paralleled by macro tasks), mid-grain parallelism (paralleled by basic program blocks), and fine-grain parallelism (paralleled in repetition blocks). This model also supports the task parallel, data parallel, and sequential executing. In this paper we also discuss the programming mechanism of this model by extended OpenMP specification. The extensions include computing resource partition, defining different grain task groups, mapping from task groups to the respective processor groups, out-of-core computing, asynchronous parallel I/O, and definition of sequential relationship of tasks. We compare the performance of different implementations of benchmark, using the same numerical algorithm but employing different programming approaches, including MPI, MPI + OpenMP, and our extended OpenMP. We also discuss a case based on SMP-Cluster and network storage architecture.