Advanced programming in the UNIX environment
Advanced programming in the UNIX environment
Performance of the NAS parallel benchmarks on PVM-based networks
Journal of Parallel and Distributed Computing
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Global arrays: a nonuniform memory access programming model for high-performance computers
The Journal of Supercomputing
A new model for integrated nested task and data parallel programming
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Approaches for Integrating Task and Data Parallelism
IEEE Concurrency
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
Library support for hierarchical multi-processor tasks
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
UPC performance and potential: a NPB experimental study
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors
Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
Dynamically Controlling False Sharing in Distributed Shared Memory
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Multilevel Parallelization Models: Application to VIV
DOD_UGC '03 Proceedings of the 2003 DoD User Group Conference
Parallel, multigrain iterative solvers for hiding network latencies on MPPs and networks of clusters
Parallel Computing - Parallel matrix algorithms and applications (PMAA '02)
Processor-Group Aware Runtime Support for Shared- and Global-Address Space Models
ICPPW '04 Proceedings of the 2004 International Conference on Parallel Processing Workshops
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit
International Journal of High Performance Computing Applications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit
International Journal of High Performance Computing Applications
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
Exploiting processor groups is becoming increasingly important for programming next-generation high-end systems composed of tens or hundreds of thousands of processors. This paper discusses the requirements, functionality and development of multilevel-parallelism based on processor groups in the context of the Global Array (GA) shared memory programming model. The main effort involves management of shared data, rather than interprocessor communication. Experimental results for the NAS NPB Conjugate Gradient benchmark and a molecular dynamics (MD) application are presented for a Linux cluster with Myrinet and illustrate the value of the proposed approach for improving scalability. While the original GA version of the CG benchmark lagged MPI, the processor-group version outperforms MPI in all cases, except for a few points on the smallest problem size. Similarly, processor groups were very effective in improving scalability of a Molecular Dynamics application