The Stanford Dash Multiprocessor
Computer
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
IEEE Standard for Scalable Coherent Interface, Science: IEEE Std. 1596-1992
IEEE Standard for Scalable Coherent Interface, Science: IEEE Std. 1596-1992
Scalable Shared-Memory Multiprocessing
Scalable Shared-Memory Multiprocessing
A 9.6 GigaByte/s Throughput Plesiochronous Routing Chip
COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Hardware fault containment in scalable shared-memory multiprocessors
Proceedings of the 24th annual international symposium on Computer architecture
IEEE Transactions on Computers - Special issue on cache memory and related problems
IEEE Transactions on Computers - Special issue on cache memory and related problems
Memory sharing predictor: the key to a speculative coherent DSM
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Optimal replacements in caches with two miss costs
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Reliable verification using symbolic simulation with scalar values
Proceedings of the 37th Annual Design Automation Conference
Selective, accurate, and timely self-invalidation using last-touch prediction
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols
IEEE Transactions on Computers
MediaWorm: A QoS Capable Router Architecture for Clusters
IEEE Transactions on Parallel and Distributed Systems
Priority Based Messaging for Software Distributed Shared Memory
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Symbolic Simulation with Approximate Values
FMCAD '00 Proceedings of the Third International Conference on Formal Methods in Computer-Aided Design
In-Order Packet Delivery in Interconnection Networks using Adaptive Routing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Traffic Scheduling Solutions with QoS Support for an Input-Buffered MultiMedia Router
IEEE Transactions on Parallel and Distributed Systems
Formal Verification and its Impact on the Snooping versus Directory Protocol Debate
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
MMR: A MultiMedia Router architecture to support hybrid workloads
Journal of Parallel and Distributed Computing
WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
Experience with building a commodity intel-based ccNUMA system
IBM Journal of Research and Development
High-throughput coherence control and hardware messaging in everest
IBM Journal of Research and Development
Cost-aware caching schemes in heterogeneous storage systems
The Journal of Supercomputing
Hi-index | 0.01 |
This paper presents HAL's Mercury Interconnect Architecture, an interconnect infrastructure designed to link commodity microprocessors, memory, and I/O components into high-performance multiprocessing servers. Both shared-memory and message-passing systems, as well as hybrid systems are supported by the interconnect. The key attributes of the Mercury Interconnect Architecture are: low latency, high bandwidth, a modular and flexible design, reliability/availability/serviceability (RAS) features, and a simplicity that enables very cost-effective implementations. The first implementation of the architecture links multiple 4-processor Pentium™ Pro based nodes. In a 4-node (16-processor) shared-memory configuration, this system achieves a remote read latency of just over 1 µs, and a maximum interconnect bandwidth of 6.4 GByte/s. Both of these parameters far outpace comparable SCI-based solutions, while utilizing much fewer hardware components.