Proceedings of the 25th annual international symposium on Computer architecture
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors
Proceedings of the 25th annual international symposium on Computer architecture
Excel-NUMA: Toward Programmability, Simplicity, and High Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
Efficient management of memory hierarchies in embedded DRAM systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Data management in networks: experimental evaluation of a provably good strategy
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Design and Evaluation of a Switch Cache Architecture for CC-NUMA Multiprocessors
IEEE Transactions on Computers
Exploiting Network Locality for CC-NUMA Multiprocessors
The Journal of Supercomputing
Proceedings of the 2001 ACM symposium on Applied computing
Cache-Only Memory Architectures
Computer
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
ASR: Adaptive Selective Replication for CMP Caches
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proximity-aware directory-based coherence for multi-core processor architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
SimICS/sun4m: a virtual workstation
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
A comparative evaluation of hybrid distributed shared-memory systems
Journal of Systems Architecture: the EUROMICRO Journal
Reactive NUCA: near-optimal block placement and replication in distributed caches
Proceedings of the 36th annual international symposium on Computer architecture
Hi-index | 0.00 |
Many future applications for scalable shared-memory multiprocessors are likely to have large working sets that overflow secondary or tertiary caches. Two possible solutions to this problem are to add a very large cache called remote cache that caches remote data (NUMA-RC), or organize the machine as a cache-only memory architecture (COMA). This paper tries to determine which solution is best. To compare the performance of the two organizations for the same amount of total memory, we introduce a model of data sharing. The model uses three data sharing patterns: replication, read-mostly migration, and read-write migration. Replication data is accessed in read-mostly mode by several processors, while migration data is accessed largely by one processor at a time. For large working sets, the weight of the migration data largely determines whether COMA outperforms NUMA-RC. Ideally, COMA only needs to fit the replication data in its extra memory; the migration data will simply be swapped between attraction memories. The remote cache of NUMA-RC, instead, needs to house both the replication and the migration data. However, simulations of seven Splash2 applications show that COMA does not outperform NUMA-RC. This is due to two reasons. First, the extra memory added has more associativity in NUMA-RC than in COMA and, therefore, can be utilized better by the working set in NUMA-RC. Second, COMA memory accesses are more expensive Of course, our results are affected by the applications used, which have been optimized for a cache-coherent NUMA machine. Overall, since NUMA-RC is cheaper, NUMA-RC is more cost-effective for these applications.