Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Micro benchmark analysis of the KSR1
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
CMMD: active messages on the CM-5
Parallel Computing - Special issue: message passing interfaces
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
MGS: a multigrain shared memory system
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The Nexus approach to integrating multithreading and communication
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Managing multiple communication methods in high-performance networked computing systems
Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
On the design of Chant: a talking threads package
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Assessing Fast Network Interfaces
IEEE Micro
How to Get Good Performance from the CM-5 Data Network
Proceedings of the 8th International Symposium on Parallel Processing
Experience with active messages on the Meiko CS-2
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
A taxonomy of programming models for symmetric multiprocessors and SMP clusters
PMMP '95 Proceedings of the conference on Programming Models for Massively Parallel Computers
Active Message Applications Programming Interface
Active Message Applications Programming Interface
Design challenges of virtual networks: fast, general-purpose communication
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Time Warp simulation on clumps
PADS '99 Proceedings of the thirteenth workshop on Parallel and distributed simulation
On bounding time and space for multiprocessor garbage collection
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
BIP-SMP: high performance message passing over a cluster of commodity SMPs
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Optimization of MPI collectives on clusters of large-scale SMP's
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A Programming Methodology for Dual-Tier Multicomputers
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Optimizing threaded MPI execution on SMP clusters
ICS '01 Proceedings of the 15th international conference on Supercomputing
Communication overlap in multi-tier parallel algorithms
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
A Software Suite for High-Performance Communications on Clusters of SMPs
Cluster Computing
Protocols and Software for Exploiting Myrinet Clusters
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Cluster SMP Nodes with the ATOLL Network: A Look into the Future of System Area Networks
HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Incorporating Quality-of-Service in the Virtual Interface Architecture
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
One-sided Communication on the Myrinet-based SMP Clusters using the GM Message-Passing Library
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
The MultiCluster Model to the Integrated Use of Multiple Workstation Clusters
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Iteration Space Slicing for Locality
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
The Data Mover: A Machine-Independent Abstraction for Managing Customized Data Motion
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
A Multiprotocol Communication Support for the Global Address Space Programming Model on the IBM SP
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Ultra-high performance communication with MPI and the Sun fire™ link interconnect
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
ARMI: an adaptive, platform independent communication library
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing Parallel Applications for Wide-Area Clusters
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Managing Concurrent Access for Shared Memory Active Messages
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Performance Analysis of a Myrinet-Based Cluster
Cluster Computing
On bounding time and space for multiprocessor garbage collection
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Analysis of Design Considerations for Optimizing Multi-Channel MPI over InfiniBand
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Topology-aware tile mapping for clusters of SMPs
Proceedings of the 3rd conference on Computing frontiers
Performance prediction through simulation of a hybrid MPI/OpenMP application
Parallel Computing - OpenMp
ICHIT'06 Proceedings of the 1st international conference on Advances in hybrid information technology
Accelerating data movement on future chip multi-processors
Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Hi-index | 0.01 |
Clusters of multiprocessors, or Clumps, promise to be the supercomputers of the future, but obtaining high performance on these architectures requires an understanding of interactions between the multiple levels of interconnection. In this paper, we present the first multi-protocol implementation of a lightweight message layer---a version of Active Messages-II running on a cluster of Sun Enterprise 5000 servers connected with Myrinet. This research brings together several pieces of high-performance interconnection technology: bus backplanes for symmetric multiprocessors, low-latency networks for connections between machines, and simple, user-level primitives for communication. The paper describes the shared memory message-passing protocol and analyzes the multi-protocol implementation with both microbenchmarks and Split-C applications. Three aspects of the communication layer are critical to performance: the overhead of cache-coherence mechanisms, the method of managing concurrent access, and the cost of accessing state with the slower protocol. Through the use of an adaptive polling strategy, the multi-protocol implementation limits performance interactions between the protocols, delivering up to 160 MB/s of bandwidth with 3.6 microsecond end-to-end latency. Applications within an SMP benefit from this fast communication, running up to 75% faster than on a network of uniprocessor workstations. Applications running on the entire Clump are limited by the balance of NIC's to processors in our system, and are typically slower than on the NOW. These results illustrate several potential pitfalls for the Clumps architecture.