Comparative performance evaluation of cache-coherent NUMA and COMA architectures
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Evaluating the memory overhead required for COMA architectures
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tolerating latency through software-controlled data prefetching
Tolerating latency through software-controlled data prefetching
Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Operating system support for improving data locality on CC-NUMA compute servers
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
ICS '90 Proceedings of the 4th international conference on Supercomputing
Reactive NUMA: a design for unifying S-COMA and CC-NUMA
Proceedings of the 24th annual international symposium on Computer architecture
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors
Proceedings of the 25th annual international symposium on Computer architecture
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Reducing Remote Conflict Misses: NUMA with Remote Cache versus COMA
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Enhancing Memory Use in Simple Coma: Multiplexed Simple Coma
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
PRISM: An Integrated Architecture for Scalable Shared Memory
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
The Effectiveness of SRAM Network Caches in Clustered DSMs
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
WildFire: A Scalable Path for SMPs
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Performance experiences on Sun's Wildfire prototype
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Reducing the Replacement Overhead on COMA Protocols for Workstation-Based Architectures
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Modeling and evaluating the time overhead induced by BER in COMA multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
ASR: Adaptive Selective Replication for CMP Caches
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proximity-aware directory-based coherence for multi-core processor architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
YAARC: yet another approach to further reducing the rate of conflict misses
The Journal of Supercomputing
On-chip COMA cache-coherence protocol for microgrids of microthreaded cores
Euro-Par'07 Proceedings of the 2007 conference on Parallel processing
Hi-index | 4.10 |
The shared-memory concept makes it easier to write parallel programs, but tuning the application to reduce the impact of frequent long-latency memory accesses still requires substantial programmer effort. Researchers have proposed using compilers, operating systems, or architectures to improve performance by allocating data close to the processors that use it.The Cache-Only Memory Architecture (COMA) increases the chances of data being available locally because the hardware transparently replicates the data and migrates it to the memory module of the node that is currently accessing it. Each memory module acts as a huge cache memory in which each block has a tag with the address and the state.The authors explain the functionality, architecture, performance, and complexity of COMA systems. They also outline different COMA designs, compare COMA to traditional nonuniform memory access (NUMA) systems, and describe proposed improvements in NUMA systems that target the same performance obstacles as COMA.