Contention is no obstacle to shared-memory multiprocessing
Communications of the ACM - Special issue on parallelism
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
The J-machine multicomputer: an architectural evaluation
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The accuracy of trace-driven simulations of multiprocessors
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
METRO: a router architecture for high-performance, short-haul routing networks
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
NIFDY: a low overhead, high throughput network interface
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Proceedings of the 28th annual international symposium on Microarchitecture
The performance of simple routing algorithms that drop packets
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
A Cost and Speed Model for k-ary n-Cube Wormhole Routers
IEEE Transactions on Parallel and Distributed Systems
Ethernet: distributed packet switching for local computer networks
Communications of the ACM - Special 25th Anniversary Issue
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Accuracy of Memory Reference Traces of Parallel Computations in Trace-Drive Simulation
IEEE Transactions on Parallel and Distributed Systems
Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels
IEEE Transactions on Parallel and Distributed Systems
How to Get Good Performance from the CM-5 Data Network
Proceedings of the 8th International Symposium on Parallel Processing
Guaranteeing Idempotence for Tightly-Coupled, Fault-Tolerant Networks
PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
The Reliable Router: A Reliable and High-Performance Communication Substrate for Parallel Computers
PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
Reliable Interconnection Networks for Parallel Computers
Reliable Interconnection Networks for Parallel Computers
Blue Gene: a vision for protein science using a petaflop supercomputer
IBM Systems Journal - Deep computing for the life sciences
Analytically Modeling a Fault-Tolerant Messaging Protocol
IEEE Transactions on Computers
Hi-index | 0.00 |
As parallel machines scale to one million nodes and beyond, it becomes increasingly difficult to build a reliable network that is able to guarantee packet delivery. Eventually large systems will need to employ fault-tolerant messaging protocols that afford correct execution in the presence of a lossy network. In this paper we present a lightweight protocol that preserves message idempotence and is easy to implement in hardware. We identify the requirements for a correct implementation of the protocol. Experiments are performed in simulation to determine implementation parameters that optimize performance. We find that an aggressive implementation on a fat tree network results in a slowdown of less than 2x compared to buffered wormhole routing on a fault-free network.