Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Virtual memory mapped network interface for the SHRIMP multicomputer
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Shared Memory Programming in Metacomputing Environments: The Global Array Approach
The Journal of Supercomputing - Special issue: high performance distributed computing
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
Performance of the CRAY T3E multiprocessor
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
SCI: Scalable Coherent Interface, Architecture and Software for High-Performance Compute Clusters
SCI: Scalable Coherent Interface, Architecture and Software for High-Performance Compute Clusters
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
GASNet Specification, v1.1
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
PathScale InfiniPath: A First Look
HOTI '05 Proceedings of the 13th Symposium on High Performance Interconnects
Optimised Global Reduction on QsNet^ⅠⅠ
HOTI '05 Proceedings of the 13th Symposium on High Performance Interconnects
Parallel programming and code selection in fortress
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
The HPC Challenge (HPCC) benchmark suite
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A preliminary analysis of the infinipath and XD1 network interfaces
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Challenges and issues in benchmarking MPI
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Runtime optimization of vector operations on large scale SMP clusters
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Hybrid PGAS runtime support for multicore nodes
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
ACM SIGMETRICS Performance Evaluation Review
Hi-index | 0.00 |
Partitioned global address space (PGAS) programming models have been identified as one of the few viable approaches for dealing with emerging many-core systems. These models tend to generate many small messages, which requires specific support from the network interface hardware to enable efficient execution. In the past, Cray included E-registers on the Cray T3E to support the SHMEM API; however, with the advent of multi-core processors, the balance of computation to communication capabilities has shifted toward computation. This paper explores the message rates that are achievable with multi-core processors and simplified PGAS support on a more conventional network interface. For message rate tests, we find that simple network interface hardware is more than sufficient. We also find that even typical data distributions, such as cyclic or block-cyclic, do not need specialized hardware support. Finally, we assess the impact of such support on the well known RandomAccess benchmark.