Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters

Authors:
An-Chow Lai;Babak Falsafi
Affiliations:
School of Electrical & Computer Engineering, Purdue University, 1285 EE Building, West Lafayette, IN;School of Electrical & Computer Engineering, Purdue University, 1285 EE Building, West Lafayette, IN
Venue:
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Year:
2000

Citing 18
Cited 0

Experimental comparison of memory management policies for NUMA multiprocessors

ACM Transactions on Computer Systems (TOCS)
The Stanford Dash Multiprocessor

Computer
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Operating system support for improving data locality on CC-NUMA compute servers

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The Mercury Interconnect Architecture: a cost-effective infrastructure for high-performance servers

Proceedings of the 24th annual international symposium on Computer architecture
Reactive NUMA: a design for unifying S-COMA and CC-NUMA

Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors

Proceedings of the 25th annual international symposium on Computer architecture
Wisconsin Wind Tunnel II: A Fast, Portable Parallel Architecture Simulator

IEEE Concurrency
ASCOMA: An Adaptive Hybrid Shared Memory Architecture

ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
Using simple page placement policies to reduce the cost of cache fills in coherent shared-memory systems

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
A highly scalable system utilizing up to 128 PA-RISC processors

COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Reducing Remote Conflict Misses: NUMA with Remote Cache versus COMA

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
PRISM: An Integrated Architecture for Scalable Shared Memory

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
The Effectiveness of SRAM Network Caches in Clustered DSMs

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
WildFire: A Scalable Path for SMPs

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we compare and contrast two techniques to improve capacity/conflict miss traffic in CC-NUMA DSM clusters. Page migration/replication optimizes read-write accesses to a page used by a single processor by migrating the page to that processor and replicates all read-shared pages in the sharers' local memories. R-NUMA optimizes read-write accesses to any page by allowing a processor to cache that page in its main memory. Page migration/replication requires less hardware complexity as compared to R-NUMA, but has limited applicability and incurs much higher overheads even with tuned hardware/software support.In this paper, we compare and contrast page migration/replication and R-NUMA on simulated clusters of symmetric multiprocessors executing shared-memory applications. Our results show that: (1) both page migration/replication and R-NUMA significantly improve the system performance over “first-touch” migration in many applications, (2) page migration/replication has limited opportunity and can not eliminate all the capacity/conflict misses even with fast hardware support and unlimited amount of memory, (3) R-NUMA always performs best given a page cache large enough to fit an application's primary working set and subsumes page migration/replication, (4) R-NUMA benefits more from hardware support to accelerate page operations than page migration/replication, and (5) integrating page migration/replication into R-NUMA to help reduce the hardware cost requires sophisticated mechanisms and policies to select candidates for page migration/replication.