Reactive NUMA: a design for unifying S-COMA and CC-NUMA

Authors:
Babak Falsafi;David A. Wood
Affiliations:
Computer Sciences Department, University of Wisconsin, Madison, 1210 W. Dayton Street, Madison, WI;Computer Sciences Department, University of Wisconsin, Madison, 1210 W. Dayton Street, Madison, WI
Venue:
Proceedings of the 24th annual international symposium on Computer architecture
Year:
1997

Citing 17
Cited 27

Experimental comparison of memory management policies for NUMA multiprocessors

ACM Transactions on Computer Systems (TOCS)
Implementation and performance of Munin

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
The Stanford Dash Multiprocessor

Computer
DDM: A Cache-Only Memory Architecture

Computer
Working sets, cache sizes, and node granularity issues for large-scale multiprocessors

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The KSR1: experimentation and modeling of poststore

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Parallel programming in Split-C

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
IBM Power and PowerPC

IBM Power and PowerPC
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Hardware and software support for efficient exception handling

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Operating system support for improving data locality on CC-NUMA compute servers

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
A Case for NOW (Networks of Workstations)

IEEE Micro
Using simple page placement policies to reduce the cost of cache fills in coherent shared-memory systems

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
An argument for simple COMA

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture

Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors

Proceedings of the 25th annual international symposium on Computer architecture
Excel-NUMA: Toward Programmability, Simplicity, and High Performance

IEEE Transactions on Computers - Special issue on cache memory and related problems
Is SC + ILP = RC?

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Memory sharing predictor: the key to a speculative coherent DSM

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback

ACM Transactions on Computer Systems (TOCS)
Performance experiences on Sun's Wildfire prototype

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Selective, accurate, and timely self-invalidation using last-touch prediction

Proceedings of the 27th annual international symposium on Computer architecture
Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
MemorIES: a programmable, real-time hardware emulation tool for multiprocessor server design

ACM SIGPLAN Notices
Exploiting Network Locality for CC-NUMA Multiprocessors

The Journal of Supercomputing
MemorIES3: a programmable, real-time hardware emulation tool for multiprocessor server design

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Cache-Only Memory Architectures

Computer
Performance of High-Accuracy PDE Solvers on a Self-Optimizing NUMA Architecture

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
The Thread-Based Protocol Engines for CC-NUMA Multiprocessors

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors

Proceedings of the 30th annual international symposium on Computer architecture
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Memory coherence activity prediction in commercial workloads

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs

Proceedings of the 32nd annual international symposium on Computer Architecture
Store-Ordered Streaming of Shared Memory

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
ASR: Adaptive Selective Replication for CMP Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
A transparent runtime data distribution engine for OpenMP

Scientific Programming
HP scalable computing architecture

WIESS'00 Proceedings of the 1st conference on Industrial Experiences with Systems Software - Volume 1
The case for simple, visible cache coherency

Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08)
A comparative evaluation of hybrid distributed shared-memory systems

Journal of Systems Architecture: the EUROMICRO Journal
ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Reactive NUCA: near-optimal block placement and replication in distributed caches

Proceedings of the 36th annual international symposium on Computer architecture
Scale-out NUMA

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes and evaluates a new approach to directory-based cache coherence protocols called Reactive NUMA (R-NUMA). An R-NUMA system combines a conventional CC-NUMA coherence protocol with a more-recent Simple-COMA (S-COMA) protocol. What makes R-NUMA novel is the way it dynamically reacts to program and system behavior to switch between CC-NUMA and S-COMA and exploit the best aspects of both protocols. This reactive behavior allows each node in an R-NUMA system to independently choose the best protocol for a particular page, thus providing much greater performance stability than either CC-NUMA or S-COMA alone. Our evaluation is both qualitative and quantitative. We first show the theoretical result that R-NUMA's worst-case performance is bounded within a small constant factor (i.e., two to three times) of the best of CC-NUMA and S-COMA. We then use detailed execution-driven simulation to show that, in practice, R-NUMA usually performs better than either a pure CC-NUMA or pure S-COMA protocol, and no more than 57% worse than the best of CC-NUMA and S-COMA, for our benchmarks and base system assumptions.