The DASH prototype: implementation and performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Scalable cache consistency for hierarchically structured multiprocessors
The Journal of Supercomputing
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
DAC '96 Proceedings of the 33rd annual Design Automation Conference
A Performance Comparison of Hierarchical Ring- and Mesh- Connected Multiprocessor Networks
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
kappa NUMA: A Model for Clusters of SMP-Machines
PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
Interactive locality optimization on NUMA architectures
Proceedings of the 2003 ACM symposium on Software visualization
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
ACM Transactions on Embedded Computing Systems (TECS)
Project Kittyhawk: building a global-scale computer: Blue Gene/P as a generic computing platform
ACM SIGOPS Operating Systems Review
Experience with building a commodity intel-based ccNUMA system
IBM Journal of Research and Development
WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
This paper describes the design and implementation of the NUMAchine multiprocessor. As the market for CC-NUMA multiprocessors expands, this research project provides a timely architectural design and cost-effective prototype. The key to the successful implementation of our 48-processor prototype is the use of off-the-shelf components and programmable logic devices. Since this machine will serve as a research vehicle for parallel software development, a number of hardware features to enhance experimentation have been included in the design.