The high performance Fortran handbook
The high performance Fortran handbook
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
ICPP '02 Proceedings of the 2001 International Conference on Parallel Processing
A performance analysis of the Berkeley UPC compiler
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Parallel Programmability and the Chapel Language
International Journal of High Performance Computing Applications
Mapping normalization technique on the HPF compiler fhpf
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Optimizing bandwidth limited problems using one-sided communication and overlap
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
ICPPW '10 Proceedings of the 2010 39th International Conference on Parallel Processing Workshops
XcalableMP implementation and performance of NAS Parallel Benchmarks
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Hi-index | 0.00 |
XcalableMP (XMP) is a PGAS parallel language with a directive-based extension of C and Fortran. While it supports 聛gcoarray聛h as a local-view programming model, an XMP global-view programming model is useful when parallelizing data-parallel programs by adding directives with minimum code modification. This paper considers the productivity and performance of the XMP global-view programming model. In the global-view programming model, a programmer describes data distributions and work-mapping to map the computations to nodes, where the computed data are located. Global-view communication directives are used to move a part of the distributed data globally and to maintain consistency in the shadow area. Rich sets of XMP global-view programming model can reduce the cost for parallelization significantly, and optimization of 聛gprivatization聛h is not necessary. For productivity and performance study, the Omni XMP compiler and the Berkeley Unified Parallel C compiler are used. Experimental results show that XMP can implement the benchmarks with a smaller programming cost than UPC. Furthermore, XMP has higher access performance for global data, which has an affinity with own process than UPC. In addition, the XMP co array function can effectively tune the application's performance.