An integrated compile-time/run-time software distributed shared memory system
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
UPC performance and potential: a NPB experimental study
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A performance analysis of the Berkeley UPC compiler
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Performance Monitoring and Evaluation of a UPC Implementation on a NUMA Architecture
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
GASNet Specification, v1.1
Benchmark Measurements of Current UPC Platforms
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
An evaluation of global address space languages: co-array fortran and unified parallel C
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Shared memory programming for large scale machines
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Performance and environment monitoring for continuous program optimization
IBM Journal of Research and Development
Running OpenMP applications efficiently on an everything-shared SDSM
Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
A performance model for fine-grain accesses in UPC
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Automatic communication coalescing for irregular computations in UPC language
CASCON '12 Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research
Hi-index | 0.00 |
The main attraction of Partitioned Global Address Space (PGAS) languages to programmers is the ability to distribute the data to exploit the affinity of threads within shared-memory domains. Thus, PGAS languages, such as Unified Parallel C (UPC), are a promising programming paradigm for emerging parallel machines that employ hierarchical data- and task-parallelism. For example, large systems are built as distributed-shared memory architectures, where multicore nodes access a local, coherent address space and many such nodes are interconnected in a non-coherent address space to form a high-performance system. This paper studies the access patterns of shared data in UPC programs. By analyzing the access patterns of shared data in UPC we are able to make three major observations about the characteristics of programs written in a PGAS programming model: (i) there is strong evidence to support the development of automatic identification and automatic privatization of local shared data accesses; (ii) the ability for the programmer to specify how shared data is distributed among the executing threads can result in significant performance improvements; (iii) running UPC programs on a hybrid architecture will significantly increase the opportunities for automatic privatization of local shared data accesses.