A comparison of sorting algorithms for the connection machine CM-2
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
CHARM++: a portable concurrent object oriented system based on C++
OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
The Nexus approach to integrating multithreading and communication
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Managing multiple communication methods in high-performance networked computing systems
Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
ICS '99 Proceedings of the 13th international conference on Supercomputing
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Requirements for and evaluation of RMI protocols for scientific computing
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
The trade-off between implicit and explicit data distribution in shared-memory programming paradigms
ICS '01 Proceedings of the 15th international conference on Supercomputing
Finding strongly connected components in parallel in particle transport sweeps
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Multi-protocol active messages on a cluster of SMP's
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
The C++ Programming Language
Information Technology-Portable Operating System Interface
Information Technology-Portable Operating System Interface
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Development of a Class of Distributed Termination Detection Algorithms
IEEE Transactions on Knowledge and Data Engineering
Tulip: A Portable Run-Time System for Object-Parallel Systems
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
A framework for adaptive algorithm selection in STAPL
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Enabling rapid development of parallel tree search applications
Proceedings of the 5th IEEE workshop on Challenges of large applications in distributed environments
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Associative Parallel Containers in STAPL
Languages and Compilers for Parallel Computing
STAPL: standard template adaptive parallel library
Proceedings of the 3rd Annual Haifa Experimental Systems Conference
The STAPL parallel container framework
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
ARMI is a communication library that provides a framework for expressing fine-grain parallelism and mapping it to a particular machine using shared-memory and message passing library calls. The library is an advanced implementation of the RMI protocol and handles low-level details such as scheduling incoming communication and aggregating outgoing communication to coarsen parallelism when necessary. These details can be tuned for different platforms to allow user codes to achieve the highest performance possible without manual modification. ARMI is used by STAPL, our generic parallel library, to provide a portable, user transparent communication layer. We present the basic design as well as the mechanisms used in the current Pthreads/OpenMP, MPI implementations and/or a combination thereof. Performance comparisons between ARMI and explicit use of Pthreads or MPI are given on a variety of machines, including an HP V2200, SGI Origin 3800, IBM Regatta-HPC and IBM RS6000 SP cluster.