Cache-Only Memory Architectures

Authors:
Fredrik Dahlgren;Josep Torrellas
Affiliations:
-;-
Venue:
Computer
Year:
1999

Citing 17
Cited 8

A Survey of Cache Coherence Schemes for Multiprocessors

Computer
Comparative performance evaluation of cache-coherent NUMA and COMA architectures

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
DDM: A Cache-Only Memory Architecture

Computer
Evaluating the memory overhead required for COMA architectures

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tolerating latency through software-controlled data prefetching

Tolerating latency through software-controlled data prefetching
Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Operating system support for improving data locality on CC-NUMA compute servers

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Reactive NUMA: a design for unifying S-COMA and CC-NUMA

Proceedings of the 24th annual international symposium on Computer architecture
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors

Proceedings of the 25th annual international symposium on Computer architecture
Shared Memory Consistency Models: A Tutorial

Computer
An argument for simple COMA

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Reducing Remote Conflict Misses: NUMA with Remote Cache versus COMA

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Enhancing Memory Use in Simple Coma: Multiplexed Simple Coma

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
PRISM: An Integrated Architecture for Scalable Shared Memory

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
The Effectiveness of SRAM Network Caches in Clustered DSMs

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
WildFire: A Scalable Path for SMPs

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture

Performance experiences on Sun's Wildfire prototype

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Reducing the Replacement Overhead on COMA Protocols for Workstation-Based Architectures

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Modeling and evaluating the time overhead induced by BER in COMA multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors

Proceedings of the 32nd annual international symposium on Computer Architecture
ASR: Adaptive Selective Replication for CMP Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proximity-aware directory-based coherence for multi-core processor architectures

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
YAARC: yet another approach to further reducing the rate of conflict misses

The Journal of Supercomputing
On-chip COMA cache-coherence protocol for microgrids of microthreaded cores

Euro-Par'07 Proceedings of the 2007 conference on Parallel processing

Quantified Score

Hi-index	4.10

Visualization

Abstract

The shared-memory concept makes it easier to write parallel programs, but tuning the application to reduce the impact of frequent long-latency memory accesses still requires substantial programmer effort. Researchers have proposed using compilers, operating systems, or architectures to improve performance by allocating data close to the processors that use it.The Cache-Only Memory Architecture (COMA) increases the chances of data being available locally because the hardware transparently replicates the data and migrates it to the memory module of the node that is currently accessing it. Each memory module acts as a huge cache memory in which each block has a tag with the address and the state.The authors explain the functionality, architecture, performance, and complexity of COMA systems. They also outline different COMA designs, compare COMA to traditional nonuniform memory access (NUMA) systems, and describe proposed improvements in NUMA systems that target the same performance obstacles as COMA.