Multi-protocol active messages on a cluster of SMP's

Authors:
Steven S. Lumetta;Alan M. Mainwaring;David E. Culler
Affiliations:
University of California, Berkeley;University of California, Berkeley;University of California, Berkeley
Venue:
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Year:
1997

Citing 19
Cited 33

Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Micro benchmark analysis of the KSR1

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Parallel programming in Split-C

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
CMMD: active messages on the CM-5

Parallel Computing - Special issue: message passing interfaces
LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
MGS: a multigrain shared memory system

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The Nexus approach to integrating multithreading and communication

Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Perspectives on Supercomputing: Three Decades of Change

Computer
Managing multiple communication methods in high-performance networked computing systems

Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
Application restructuring and performance portability on shared virtual memory and hardware-coherent multiprocessors

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
On the design of Chant: a talking threads package

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
Assessing Fast Network Interfaces

IEEE Micro
How to Get Good Performance from the CM-5 Data Network

Proceedings of the 8th International Symposium on Parallel Processing
Experience with active messages on the Meiko CS-2

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
A taxonomy of programming models for symmetric multiprocessors and SMP clusters

PMMP '95 Proceedings of the conference on Programming Models for Massively Parallel Computers
Active Message Applications Programming Interface

Active Message Applications Programming Interface

Design challenges of virtual networks: fast, general-purpose communication

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
MagPIe: MPI's collective communication operations for clustered wide area systems

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Time Warp simulation on clumps

PADS '99 Proceedings of the thirteenth workshop on Parallel and distributed simulation
On bounding time and space for multiprocessor garbage collection

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
BIP-SMP: high performance message passing over a cluster of commodity SMPs

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Optimization of MPI collectives on clusters of large-scale SMP's

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A Programming Methodology for Dual-Tier Multicomputers

IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Optimizing threaded MPI execution on SMP clusters

ICS '01 Proceedings of the 15th international conference on Supercomputing
Communication overlap in multi-tier parallel algorithms

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
A Software Suite for High-Performance Communications on Clusters of SMPs

Cluster Computing
Virtual Network Transport Protocols for Myrinet

IEEE Micro
Protocols and Software for Exploiting Myrinet Clusters

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Cluster SMP Nodes with the ATOLL Network: A Look into the Future of System Area Networks

HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Incorporating Quality-of-Service in the Virtual Interface Architecture

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
One-sided Communication on the Myrinet-based SMP Clusters using the GM Message-Passing Library

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
The MultiCluster Model to the Integrated Use of Multiple Workstation Clusters

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Iteration Space Slicing for Locality

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
The Data Mover: A Machine-Independent Abstraction for Managing Customized Data Motion

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Performance of the NAS Benchmarks on a Cluster of SMP PCs Using a Parallelization of the MPI Programs with OpenMP

PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
A Multiprotocol Communication Support for the Global Address Space Programming Model on the IBM SP

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Ultra-high performance communication with MPI and the Sun fire™ link interconnect

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
ARMI: an adaptive, platform independent communication library

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing Parallel Applications for Wide-Area Clusters

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Managing Concurrent Access for Shared Memory Active Messages

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Performance Analysis of a Myrinet-Based Cluster

Cluster Computing
On bounding time and space for multiprocessor garbage collection

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Analysis of Design Considerations for Optimizing Multi-Channel MPI over InfiniBand

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Topology-aware tile mapping for clusters of SMPs

Proceedings of the 3rd conference on Computing frontiers
Performance prediction through simulation of a hybrid MPI/OpenMP application

Parallel Computing - OpenMp
Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem

Parallel Computing
Design and implementation of a performance analysis and visualization toolkit for cluster environments

ICHIT'06 Proceedings of the 1st international conference on Advances in hybrid information technology
Accelerating data movement on future chip multi-processors

Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies

Quantified Score

Hi-index	0.01

Visualization

Abstract

Clusters of multiprocessors, or Clumps, promise to be the supercomputers of the future, but obtaining high performance on these architectures requires an understanding of interactions between the multiple levels of interconnection. In this paper, we present the first multi-protocol implementation of a lightweight message layer---a version of Active Messages-II running on a cluster of Sun Enterprise 5000 servers connected with Myrinet. This research brings together several pieces of high-performance interconnection technology: bus backplanes for symmetric multiprocessors, low-latency networks for connections between machines, and simple, user-level primitives for communication. The paper describes the shared memory message-passing protocol and analyzes the multi-protocol implementation with both microbenchmarks and Split-C applications. Three aspects of the communication layer are critical to performance: the overhead of cache-coherence mechanisms, the method of managing concurrent access, and the cost of accessing state with the slower protocol. Through the use of an adaptive polling strategy, the multi-protocol implementation limits performance interactions between the protocols, delivering up to 160 MB/s of bandwidth with 3.6 microsecond end-to-end latency. Applications within an SMP benefit from this fast communication, running up to 75% faster than on a network of uniprocessor workstations. Applications running on the entire Clump are limited by the balance of NIC's to processors in our system, and are typically slower than on the NOW. These results illustrate several potential pitfalls for the Clumps architecture.