The Stanford Dash Multiprocessor
Computer
Comparative performance evaluation of cache-coherent NUMA and COMA architectures
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Simulation of multiprocessors: accuracy and performance
Simulation of multiprocessors: accuracy and performance
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Operating system support for improving data locality on CC-NUMA compute servers
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Reactive NUMA: a design for unifying S-COMA and CC-NUMA
Proceedings of the 24th annual international symposium on Computer architecture
Reducing Remote Conflict Misses: NUMA with Remote Cache versus COMA
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Hi-index | 0.00 |
While hardware-coherent scalable shared-memory multiprocessors are relatively easy to program, they still require substantial programming effort to deliver high performance. Specifically, to minimize remote accesses, data must be carefully laid out in memory for locality and application working sets carefully tuned for caches. It has been claimed that this programming effort is less necessary in hardware COMA machines like Flat-COMA thanks to automatic line-based data migration. Unfortunately, Flat-COMA is complex to design. Consequently, we would like a machine as programmable as Flat-COMA, as simple as plain CC-NUMA, and that outperforms both. This paper presents our proposal: Excel-NUMA (EX-NUMA). The idea is to exploit the fact that, after a memory line is written and cached, the storage that kept the line in memory is unutilized. We use that storage to temporarily hold remote data displaced from the local caches. This enables automatic data migration, like in Flat-COMA, enhancing programmability. The hardware required to manage the system is a simple, local module added to a CC-NUMA; the global cache coherence protocol is not changed. Simulations of Splash2 applications show that EX-NUMA outperforms CC-NUMA and Flat-COMA in every single application and eliminates most of the conflict misses.