Data management in hierarchical bus networks
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Architecture and design of AlphaServer GS320
ACM SIGPLAN Notices
Architecture and design of AlphaServer GS320
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Conceptual Prototyping of Scalable Embedded DSP Systems
IEEE Design & Test
kappa NUMA: A Model for Clusters of SMP-Machines
PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
International Journal of High Performance Computing Applications
RapidIO for radar processing in advanced space systems
ACM Transactions on Embedded Computing Systems (TECS)
RISC: A resilient interconnection network for scalable cluster storage systems
Journal of Systems Architecture: the EUROMICRO Journal
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Using in-flight chains to build a scalable cache coherence protocol
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.25 |
There is rapidly increasing demand for very-high-performance networked communication for workstation clusters, distributed databases, multiprocessors, industrial data acquisition and control systems, shared access to distributed data, and so on. Higher-bandwidth hardware using the traditional protocols is not sufficient. Even at 100 Mb/s, and certainly at 250 Mb/s, throughput for many applications is so limited by delays due to architecturally induced inefficiencies, such as software overheads (often hundreds of microseconds), that higher bandwidth generally raises cost without improving performance. A new approach to communication is required, one that can eliminate the delay due to software overheads, if we are to reap the full benefit of the far higher bandwidths that modern hardware can provide. The SCI solves this problem by using the distributed-shared-memory paradigm, typically offering submicrosecond delays and bandwidths currently in the range of 1250 to 8000 Mb/s per network node. The article first reviews the general properties that an appropriate system architecture should have, and introduces an architectural model, the local area multiprocessor, distinguished by its shared-memory performance and its ability to handle LAN-style distances. These desired properties are then considered in more detail, and practical design decisions are made, illustrated by the evolution of the ISO/ANSI/IEEE standard Scalable Coherent Interface (SCI) as it addressed these issues. Finally, the current status of the various SCI follow-on and support projects is reported