The NUMAchine Multiprocessor

Authors:
R. Grindley;T. Abdelrahman;S. Brown;S. Caranci;D. DeVries;B. Gamsa;A. Grbic;M. Gusat;R. Ho;O. Krieger;G. Lemieux;K. Loveless;N. Manjikian;P. McHardy;S. Srbljic;M. Stumm;Z. Vranesic;Z. Zilic
Affiliations:
-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-
Venue:
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Year:
2000

Citing 16
Cited 8

The DASH prototype: implementation and performance

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Scalable cache consistency for hierarchically structured multiprocessors

The Journal of Supercomputing
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Design and implementation of the NUMAchine multiprocessor

DAC '98 Proceedings of the 35th annual Design Automation Conference
A methodology and an evaluation of the SGI Origin2000

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Multiprocessors Should Support Simple Memory-Consistency Models

Computer
Starfire: Extending the SMP Envelope

IEEE Micro
The Illinois Aggressive Coma Multiprocessor project (I-ACOMA)

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-Memory Multiprocessors.

HIPC '98 Proceedings of the Fifth International Conference on High Performance Computing
A Performance Comparison of Hierarchical Ring- and Mesh- Connected Multiprocessor Networks

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Mint Tutorial and User Manual

Mint Tutorial and User Manual
The numachine multiprocessor: design and analysis

The numachine multiprocessor: design and analysis

Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Shared Virtual Memory Clusters with Next-Generation Interconnection Networks and Wide Compute Nodes

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
kappa NUMA: A Model for Clusters of SMP-Machines

PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
Using System Emulation to Model Next-Generation Shared Virtual Memory Clusters

Cluster Computing
K42: building a complete operating system

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Torus Ring: improving performance of interconnection network by modifying hierarchical ring

Parallel Computing
A Hybrid Ring/Mesh Interconnect for Network-on-Chip Using Hierarchical Rings for Global Routing

NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Modeling and evaluation of ring-based interconnects for Network-on-Chip

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Small-scale multiprocessors are becoming increasingly economical and common, whereas larger multiprocessors continue to have higher per-node costs. The NUMAchine multiprocessor project seeks to make large-scale multiprocessors more economical while maintaining high performance by exploring architectural and hardware features for low-cost, modular multiprocessors. To demonstrate our approach, we have implemented a prototype system that is scalable to 128 processors. An efficient directory-based cache coherence protocol exploits our hierarchical ring-based interconnect and supports sequential consistency. This paper documents the design choices and the resulting performance of the system using both simulation results and measurements on the prototype hardware.