Experimental comparison of memory management policies for NUMA multiprocessors
ACM Transactions on Computer Systems (TOCS)
The Stanford Dash Multiprocessor
Computer
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Operating system support for improving data locality on CC-NUMA compute servers
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The Mercury Interconnect Architecture: a cost-effective infrastructure for high-performance servers
Proceedings of the 24th annual international symposium on Computer architecture
Reactive NUMA: a design for unifying S-COMA and CC-NUMA
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors
Proceedings of the 25th annual international symposium on Computer architecture
ASCOMA: An Adaptive Hybrid Shared Memory Architecture
ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
A highly scalable system utilizing up to 128 PA-RISC processors
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Reducing Remote Conflict Misses: NUMA with Remote Cache versus COMA
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
PRISM: An Integrated Architecture for Scalable Shared Memory
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
The Effectiveness of SRAM Network Caches in Clustered DSMs
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
WildFire: A Scalable Path for SMPs
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Hi-index | 0.00 |
In this paper, we compare and contrast two techniques to improve capacity/conflict miss traffic in CC-NUMA DSM clusters. Page migration/replication optimizes read-write accesses to a page used by a single processor by migrating the page to that processor and replicates all read-shared pages in the sharers' local memories. R-NUMA optimizes read-write accesses to any page by allowing a processor to cache that page in its main memory. Page migration/replication requires less hardware complexity as compared to R-NUMA, but has limited applicability and incurs much higher overheads even with tuned hardware/software support.In this paper, we compare and contrast page migration/replication and R-NUMA on simulated clusters of symmetric multiprocessors executing shared-memory applications. Our results show that: (1) both page migration/replication and R-NUMA significantly improve the system performance over “first-touch” migration in many applications, (2) page migration/replication has limited opportunity and can not eliminate all the capacity/conflict misses even with fast hardware support and unlimited amount of memory, (3) R-NUMA always performs best given a page cache large enough to fit an application's primary working set and subsumes page migration/replication, (4) R-NUMA benefits more from hardware support to accelerate page operations than page migration/replication, and (5) integrating page migration/replication into R-NUMA to help reduce the hardware cost requires sophisticated mechanisms and policies to select candidates for page migration/replication.