High performance messaging on workstations: Illinois fast messages (FM) for Myrinet

Authors:
Scott Pakin;Mario Lauria;Andrew Chien
Affiliations:
Department of Computer Science, University of Illinois at Urbana-Champaign, 1304 W. Springfield Ave., Urbana, IL;Dipartimento di Informatica e Sistemistica, Università di Napoli 'Federico II' via Claudio 21, 80125 Napoli, Italy;Department of Computer Science, University of Illinois at Urbana-Champaign, 1304 W. Springfield Ave., Urbana, IL
Venue:
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Year:
1995

Citing 14
Cited 85

Computer networks

Computer networks
PVM: a framework for parallel distributed computing

Concurrency: Practice and Experience
Internetworking with TCP/IP (2nd ed.), vol. I

Internetworking with TCP/IP (2nd ed.), vol. I
High-performance switching with fibre channel

COMPCON '92 Proceedings of the thirty-seventh international conference on COMPCON
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Fbufs: a high-bandwidth cross-domain transfer facility

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
Low-Latency Communication Over ATM Networks Using Active Messages

IEEE Micro
A Case for NOW (Networks of Workstations)

IEEE Micro
MPI: A Message-Passing Interface Standard

MPI: A Message-Passing Interface Standard
The VuNet desk area network: architecture, implementation, and experience

IEEE Journal on Selected Areas in Communications
How inefficient is IP over ATM anyway?

IEEE Network: The Magazine of Global Internetworking

The impact of a zero-scan Internet checksumming mechanism

ACM SIGCOMM Computer Communication Review
Multicasting protocols for high-speed, wormhole-routing local area networks

Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
pSNOW: a tool to evaluate architectural issues for NOW environments

ICS '97 Proceedings of the 11th international conference on Supercomputing
Relaxed consistency and coherence granularity in DSM systems: a performance evaluation

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Effects of communication latency, overhead, and bandwidth in a cluster architecture

Proceedings of the 24th annual international symposium on Computer architecture
Performance evaluation of the Orca shared-object system

ACM Transactions on Computer Systems (TOCS)
Modeling communication pipeline latency

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
MBCF: a protected and virtualized high-speed user-level memory-based communication facility

ICS '98 Proceedings of the 12th international conference on Supercomputing
Implementation of reductions in support of PDES on a network of workstations

PADS '98 Proceedings of the twelfth workshop on Parallel and distributed simulation
Shared Memory Programming in Metacomputing Environments: The Global Array Approach

The Journal of Supercomputing - Special issue: high performance distributed computing
A High Performance Message-Passing System for Network of Workstations

The Journal of Supercomputing - Special issue: high performance distributed computing
Performance monitoring in a Myrinet-connected SHRIMP cluster

SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
UTLB: a mechanism for address translation on network interfaces

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Design challenges of virtual networks: fast, general-purpose communication

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
An efficient implementation of Java's remote method invocation

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Cluster I/O with River: making the fast case common

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Interfacing Java to the virtual interface architecture

JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
Realizing the performance potential of the virtual interface architecture

ICS '99 Proceedings of the 13th international conference on Supercomputing
The design and evaluation of high performance communication using a Gigabit Ethernet

ICS '99 Proceedings of the 13th international conference on Supercomputing
A closer look at coscheduling approaches for a network of workstations

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Security versus performance tradeoffs in RPC implementations for safe language systems

Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications
Efficient kernel support for reliable communication

SAC '98 Proceedings of the 1998 ACM symposium on Applied Computing
A simulation-based study of scheduling mechanisms for a dynamic cluster environment

Proceedings of the 14th international conference on Supercomputing
Evaluating design alternatives for reliable communication on high-speed networks

ACM SIGPLAN Notices
Multimethod communication for high-performance metacomputing applications

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A scaled version of the elastic time algorithm

Proceedings of the fifteenth workshop on Parallel and distributed simulation
Architectural Support for Efficient Multicasting in Irregular Networks

IEEE Transactions on Parallel and Distributed Systems
Building a high-performance communication layer over virtual interface architecture on Linux clusters

ICS '01 Proceedings of the 15th international conference on Supercomputing
Scheduling best-effort and real-time pipelined applications on time-shared clusters

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Evaluating design alternatives for reliable communication on high-speed networks

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
QoS provisioning in clusters: an investigation of Router and NIC design

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems

ACM Transactions on Computer Systems (TOCS)
Efficient Multicast on Irregular Switch-Based Cut-Through Networks with Up-Down Routing

IEEE Transactions on Parallel and Distributed Systems
Impact of Workload and System Parameters on Next Generation Cluster Scheduling Mechanisms

IEEE Transactions on Parallel and Distributed Systems
User-space communication: a quantitative study

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Highly efficient gang scheduling implementation

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
A system software architecture for high-end computing

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Dynamic memory management for programmable devices

Proceedings of the 3rd international symposium on Memory management
Queue pair IP: a hybrid architecture for system area networks

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Early cancellation: an active NIC optimization for time-warp

Proceedings of the sixteenth workshop on Parallel and distributed simulation
Conditional checkpoint abort: an alternative semantic for re-synchronization in CCL

Proceedings of the sixteenth workshop on Parallel and distributed simulation
Communications and network: benefits from semi-asynchronous checkpointing for time warp simulations of a large state PCS model

Proceedings of the 33nd conference on Winter simulation
Design and implementation of FMPL, a fast message-passing library for remote memory operations

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Supporting parallel applications on clusters of workstations: The Virtual Communication Machine-based architecture

Cluster Computing
Efficient layering for high speed communication: the MPI over Fast Messages (FM) experience

Cluster Computing
Implementing noncollective parallel I/O in cluster environments using Active Message communication

Cluster Computing
Y-Invalidate: A New Protocol for Implementing Weak Consistency in DSM Systems

International Journal of Parallel Programming
Models for Asynchronous Message Handling

IEEE Parallel & Distributed Technology: Systems & Technology
Fast Messages: Efficient, Portable Communication for Workstation Clusters and MPPs

IEEE Parallel & Distributed Technology: Systems & Technology
Evolution of the Virtual Interface Architecture

Computer
Client-Server Computing on Shrimp

IEEE Micro
Virtual Network Transport Protocols for Myrinet

IEEE Micro
Memory Management for User-Level Network Interfaces

IEEE Micro
On the processor scheduling problem in time warp synchronization

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Design and Implementation of Virtual Memory-Mapped Communication on Myrinet

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Can User-Level Protocols Take Advantage of Multi-CPU NICs?

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Ultra-high performance communication with MPI and the Sun fire™ link interconnect

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A survey of messaging software issues and systems for Myrinet-based clusters

Cluster computing
Modeling and optimization of non-blocking checkpointing for optimistic simulation on myrinet clusters

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Firmware-Level Latency Analysis on a Gigabit Network

The Journal of Supercomputing
CCL v3.0: Multiprogrammed Semi-Asynchronous Checkpoints

Proceedings of the seventeenth workshop on Parallel and distributed simulation
Performance and Experience with LAPI -- A New High-Performance Communication Library for the IBM RS/6000 SP

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Nonblocking Checkpointing for Optimistic Parallel Simulation: Description and an Implementation

IEEE Transactions on Parallel and Distributed Systems
On Network CoProcessors for Scalable, Predictable Media Services

IEEE Transactions on Parallel and Distributed Systems
References

Sourcebook of parallel computing
Cluster communication protocols for parallel-programming systems

ACM Transactions on Computer Systems (TOCS)
Coscheduling in Clusters: Is It a Viable Alternative?

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
BCS-MPI: A New Approach in the System Software Design for Large-Scale Parallel Computers

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Design and Evaluation of an HPVM-Based Windows NT Supercomputer

International Journal of High Performance Computing Applications
QsNetII: Defining High-Performance Network Design

IEEE Micro
Transformations to Parallel Codes for Communication-Computation Overlap

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
High performance RDMA-based MPI implementation over infiniBand

International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Performance of optimized software implementation of the iSCSI protocol

SNAPI '03 Proceedings of the international workshop on Storage network architecture and parallel I/Os
Scaling MPI to short-memory MPPs such as BG/L

Proceedings of the 20th annual international conference on Supercomputing
A comprehensive performance and energy consumption analysis of scheduling alternatives in clusters

The Journal of Supercomputing
U-Net/SLE: A Java-based user-customizable virtual network interface

Scientific Programming
Multiprogrammed non-blocking checkpoints in support of optimistic simulation on myrinet clusters

Journal of Systems Architecture: the EUROMICRO Journal
Nomad: migrating OS-bypass networks in virtual machines

Proceedings of the 3rd international conference on Virtual execution environments
Ensuring e-Transaction with Asynchronous and Uncoordinated Application Server Replicas

IEEE Transactions on Parallel and Distributed Systems
Impact of protocol overheads on network throughput over high-speed interconnects: measurement, analysis, and improvement

The Journal of Supercomputing
High-performance local area communication with fast sockets

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
High performance and scalable I/O virtualization via self-virtualized devices

Proceedings of the 16th international symposium on High performance distributed computing
Martini: A Network Interface Controller Chip for High Performance Computing with Distributed PCs

IEEE Transactions on Parallel and Distributed Systems
Can software reliability outperform hardware reliability on high performance interconnects?: a case study with MPI over infiniband

Proceedings of the 22nd annual international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Convex SPP-1000 is the first commercial implementation of a new generation of scalable shared memory parallel computers with full cache coherence. It employs a hierarchical structure of processing communication and memory name-space management resources to provide a scalableNUMA environment. Ensembles of 8 HP PA-RISC7100 microprocessorsemploy an internal cross-bar switch and directory based cache coherence scheme to provide a tightly coupled SMP.Up to 16 processing ensembles are interconnected by a 4 ring network incorporating a full hardware implementation of the SCI protocol for a full system configuration of 128 processors. This paper presents the findings of a set of empirical studies using both synthetic test codes and full applications for the Earth and space sciences to characterize the performance properties of this new architecture. It is shown that overhead and latencies of global primitive mechanisms, while low in absolute time, are significantly more costly than similar functions local to an individual processor ensemble.