Memory access buffering in multiprocessors
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Correct memory operation of cache-based multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Global arrays: a nonuniform memory access programming model for high-performance computers
The Journal of Supercomputing
ScaLAPACK user's guide
Modeling communication pipeline latency
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
Responsiveness without interrupts
ICS '99 Proceedings of the 13th international conference on Supercomputing
Location Consistency-A New Memory Model and Cache Consistency Protocol
IEEE Transactions on Computers
Communication overlap in multi-tier parallel algorithms
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
MPI-The Complete Reference, Volume 1: The MPI Core
MPI-The Complete Reference, Volume 1: The MPI Core
Global arrays: a portable "shared-memory" programming model for distributed memory computers
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
One-Sided Communication on Clusters with Myrinet
Cluster Computing
Performance Evaluation of the Quadrics Interconnection Network
Cluster Computing
Efficient Multicast on Myrinet using Link-Level Flow Control
ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
Protocols and Strategies for Optimizing Performance of Remote Memory Operations on Clusters
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
Exploting communication Latency Hiding for Parallel Network
Proceedings of the 1994 International Conference on Parallel and Distributed Systems
COMB: A Portable Benchmark Suite for Assessing MPI Overlap
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Optimizing Message Aggregation for Parallel Simulation on High Performance Clusters
MASCOTS '99 Proceedings of the 7th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
An Evaluation of Current High-Performance Networks
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
GASNet Specification, v1.1
Generalized portable shmem library for high performance computing
Generalized portable shmem library for high performance computing
Processor-Group Aware Runtime Support for Shared- and Global-Address Space Models
ICPPW '04 Proceedings of the 2004 International Conference on Parallel Processing Workshops
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations
International Journal of High Performance Computing and Networking
Optimizing All-to-All Collective Communication by Exploiting Concurrency in Modern Networks
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A High-Performance Event Service for HPC Applications
SE-HPC '07 Proceedings of the 3rd International Workshop on Software Engineering for High Performance Computing Applications
Latency-Optimized Parallelization of the FMM Near-Field Computations
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
Integrated Data and Task Management for Scientific Applications
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Lock-Free Asynchronous Rendezvous Design for MPI Point-to-Point Communication
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Proceedings of the 6th ACM conference on Computing frontiers
Enabling a highly-scalable global address space model for petascale computing
Proceedings of the 7th ACM international conference on Computing frontiers
The 48-core SCC Processor: the Programmer's View
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Introducing OpenSHMEM: SHMEM for the PGAS community
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
An open-source compiler and runtime implementation for Coarray Fortran
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
The Combinatorial BLAS: design, implementation, and applications
International Journal of High Performance Computing Applications
Journal of Parallel and Distributed Computing
Proceedings of the 9th conference on Computing Frontiers
On reducing i/o overheads in large-scale invariant subspace projections
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Work stealing and persistence-based load balancers for iterative overdecomposed applications
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Global Futures: A Multithreaded Execution Model for Global Arrays-based Applications
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Audit: A new synchronization API for the GET/PUT protocol
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing
Code generation for parallel execution of a class of irregular loops on distributed memory systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
An efficient kernel-level blocking MPI implementation
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Efficient MPI implementation of a parallel, stable merge algorithm
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Refactoring and automated performance tuning of computational chemistry application codes
Proceedings of the Winter Simulation Conference
Optimizing tensor contraction expressions for hybrid CPU-GPU execution
Cluster Computing
A framework for load balancing of tensor contraction expressions via dynamic task partitioning
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
This paper describes the Aggregate Remote Memory Copy Interface (ARMCI), a portable high performance remote memory access communication interface, developed oriinally under the U.S. Department of Energy (DOE) Advanced Computational Testing and Simulation Toolkit project and currently used and advanced as a part of the run-time layer of the DOE project, Programming Models for Scalble Parallel Computing. The paper discusses the model, addresses challenges of portable implementations, and demonstrates that ARMCI delivers high performance on a variety of platforms. Special emphasis is placed on the latency hiding mechanisms and ability to optimize noncotiguous data transfers.