Evaluating NIC hardware requirements to achieve high message rate PGAS support on multi-core processors

Authors:
Keith D. Underwood;Michael J. Levenhagen;Ron Brightwell
Affiliations:
Sandia National Laboratories, Albuquerque, NM;Sandia National Laboratories, Albuquerque, NM;Sandia National Laboratories, Albuquerque, NM
Venue:
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Year:
2007

Citing 19
Cited 4

Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Virtual memory mapped network interface for the SHRIMP multicomputer

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Synchronization and communication in the T3E multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Shared Memory Programming in Metacomputing Environments: The Global Array Approach

The Journal of Supercomputing - Special issue: high performance distributed computing
Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
Performance of the CRAY T3E multiprocessor

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
The Quadrics Network: High-Performance Clustering Technology

IEEE Micro
SCI: Scalable Coherent Interface, Architecture and Software for High-Performance Compute Clusters

SCI: Scalable Coherent Interface, Architecture and Software for High-Performance Compute Clusters
ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems

Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
GASNet Specification, v1.1

GASNet Specification, v1.1
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
PathScale InfiniPath: A First Look

HOTI '05 Proceedings of the 13th Symposium on High Performance Interconnects
Optimised Global Reduction on QsNet^ⅠⅠ

HOTI '05 Proceedings of the 13th Symposium on High Performance Interconnects
Parallel programming and code selection in fortress

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Software routing and aggregation of messages to optimize the performance of HPCC randomaccess benchmark

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
The HPC Challenge (HPCC) benchmark suite

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A preliminary analysis of the infinipath and XD1 network interfaces

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Challenges and issues in benchmarking MPI

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface

Runtime optimization of vector operations on large scale SMP clusters

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Hybrid PGAS runtime support for multicore nodes

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
A preliminary evaluation of the hardware acceleration of the Cray Gemini interconnect for PGAS languages and comparison with MPI

ACM SIGMETRICS Performance Evaluation Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Partitioned global address space (PGAS) programming models have been identified as one of the few viable approaches for dealing with emerging many-core systems. These models tend to generate many small messages, which requires specific support from the network interface hardware to enable efficient execution. In the past, Cray included E-registers on the Cray T3E to support the SHMEM API; however, with the advent of multi-core processors, the balance of computation to communication capabilities has shifted toward computation. This paper explores the message rates that are achievable with multi-core processors and simplified PGAS support on a more conventional network interface. For message rate tests, we find that simple network interface hardware is more than sufficient. We also find that even typical data distributions, such as cyclic or block-cyclic, do not need specialized hardware support. Finally, we assess the impact of such support on the well known RandomAccess benchmark.