The Mercury Interconnect Architecture: a cost-effective infrastructure for high-performance servers

Authors:
Wolf-Dietrich Weber;Stephen Gold;Pat Helland;Takeshi Shimizu;Thomas Wicki;Winfried Wilcke
Affiliations:
HAL Computer Systems, 1315 Dell Ave, Campbell, CA;HAL Computer Systems, 1315 Dell Ave, Campbell, CA;Microsoft Corporation, One Microsoft Way, Redmond, WA;HAL Computer Systems, 1315 Dell Ave, Campbell, CA;HAL Computer Systems, 1315 Dell Ave, Campbell, CA;HAL Computer Systems, 1315 Dell Ave, Campbell, CA
Venue:
Proceedings of the 24th annual international symposium on Computer architecture
Year:
1997

Citing 6
Cited 20

The Stanford Dash Multiprocessor

Computer
STiNG: a CC-NUMA computer system for the commercial marketplace

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
IEEE Standard for Scalable Coherent Interface, Science: IEEE Std. 1596-1992

IEEE Standard for Scalable Coherent Interface, Science: IEEE Std. 1596-1992
Scalable Shared-Memory Multiprocessing

Scalable Shared-Memory Multiprocessing
A 9.6 GigaByte/s Throughput Plesiochronous Routing Chip

COMPCON '96 Proceedings of the 41st IEEE International Computer Conference

Hardware fault containment in scalable shared-memory multiprocessors

Proceedings of the 24th annual international symposium on Computer architecture
A Quantitative Analysis of the Performance and Scalability of Distributed Shared Memory Cache Coherence Protocols

IEEE Transactions on Computers - Special issue on cache memory and related problems
Exploiting the Benefits of Multiple-Path Network in DSM Systems: Architectural Alternatives and Performance Evaluation

IEEE Transactions on Computers - Special issue on cache memory and related problems
Memory sharing predictor: the key to a speculative coherent DSM

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Optimal replacements in caches with two miss costs

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Reliable verification using symbolic simulation with scalar values

Proceedings of the 37th Annual Design Automation Conference
Selective, accurate, and timely self-invalidation using last-touch prediction

Proceedings of the 27th annual international symposium on Computer architecture
Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols

IEEE Transactions on Computers
MediaWorm: A QoS Capable Router Architecture for Clusters

IEEE Transactions on Parallel and Distributed Systems
Priority Based Messaging for Software Distributed Shared Memory

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Symbolic Simulation with Approximate Values

FMCAD '00 Proceedings of the Third International Conference on Formal Methods in Computer-Aided Design
In-Order Packet Delivery in Interconnection Networks using Adaptive Routing

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Traffic Scheduling Solutions with QoS Support for an Input-Buffered MultiMedia Router

IEEE Transactions on Parallel and Distributed Systems
Formal Verification and its Impact on the Snooping versus Directory Protocol Debate

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
MMR: A MultiMedia Router architecture to support hybrid workloads

Journal of Parallel and Distributed Computing
Windows NT in a ccNUMA system

WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
Experience with building a commodity intel-based ccNUMA system

IBM Journal of Research and Development
High-throughput coherence control and hardware messaging in everest

IBM Journal of Research and Development
Cost-aware caching schemes in heterogeneous storage systems

The Journal of Supercomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper presents HAL's Mercury Interconnect Architecture, an interconnect infrastructure designed to link commodity microprocessors, memory, and I/O components into high-performance multiprocessing servers. Both shared-memory and message-passing systems, as well as hybrid systems are supported by the interconnect. The key attributes of the Mercury Interconnect Architecture are: low latency, high bandwidth, a modular and flexible design, reliability/availability/serviceability (RAS) features, and a simplicity that enables very cost-effective implementations. The first implementation of the architecture links multiple 4-processor Pentium™ Pro based nodes. In a 4-node (16-processor) shared-memory configuration, this system achieves a remote read latency of just over 1 µs, and a maximum interconnect bandwidth of 6.4 GByte/s. Both of these parameters far outpace comparable SCI-based solutions, while utilizing much fewer hardware components.