Fortran 90 handbook: complete ANSI/ISO reference
Fortran 90 handbook: complete ANSI/ISO reference
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
MPI: The Complete Reference
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
A Multi-Platform Co-Array Fortran Compiler
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
A Multi-Platform Co-Array Fortran Compiler
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
An evaluation of global address space languages: co-array fortran and unified parallel C
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Co-arrays in the next Fortran Standard
ACM SIGPLAN Fortran Forum
Experiences with Sweep3D implementations in Co-array Fortran
The Journal of Supercomputing
Co-arrays in the next Fortran Standard
Scientific Programming - Fortran Programming Language and Scientific Programming: 50 Years of Mutual Growth
An open-source compiler and runtime implementation for Coarray Fortran
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Hi-index | 0.00 |
When performing source-to-source compilation of Co-array Fortran (CAF) programs into SPMD Fortran 90 codes for shared-memory multiprocessors, there are several ways of representing and manipulating data at the Fortran 90 language level. We describe a set of implementation alternatives and evaluate their performance implications for CAF variants of the STREAM, Random Access, Spark98 and NAS MG & SP benchmarks. We compare the performance of library-based implementations of one-sided communication with fine-grain communication that accesses remote data using load and store operations. Our experiments show that using application-level loads and stores for fine-grain communication can improve performance by as much as a factor of 24; however, codes requiring only coarse-grain communication can achieve better performance by using an architecture's tuned memcpy for bulk data movement.