Hardware Versus Software Implementation of COMA

Authors:
Adrian Moga;Michel Dubois;Alain Gefflaut
Affiliations:
-;-;-
Venue:
ICPP '97 Proceedings of the international Conference on Parallel Processing
Year:
1997

Citing 24
Cited 5

Comparative performance evaluation of cache-coherent NUMA and COMA architectures

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
DDM: A Cache-Only Memory Architecture

Computer
Closing the window of vulnerability in multiphase memory transactions

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Software coherence in multiprocessor memory systems

Software coherence in multiprocessor memory systems
Evaluation of release consistent software distributed shared memory on emerging network technology

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Software-extended coherent shared memory: performance and cost

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The performance impact of flexibility in the Stanford FLASH multiprocessor

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fine-grain access control for distributed shared memory

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Essential misses and data traffic in coherence protocols

Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
COMA-F: a non-hierarchical cache only memory architecture

COMA-F: a non-hierarchical cache only memory architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Efficient strategies for software-only protocols in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Decoupled hardware support for distributed shared memory

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Missing the memory wall: the case for processor/memory integration

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Informing memory operations: providing memory performance feedback in modern processors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
START-NG: Delivering Seamless Parallel Computing

Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
The Illinois Aggressive Coma Multiprocessor project (I-ACOMA)

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Using hints to reduce the read miss penalty for flat COMA protocols

HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
An argument for simple COMA

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors

The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors

Options for dynamic address translation in COMAs

Proceedings of the 25th annual international symposium on Computer architecture
Tolerating late memory traps in ILP processors

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Tolerating Late Memory Traps in Dynamically Scheduled Processors

IEEE Transactions on Computers
A comparative evaluation of hardware-only and software-only directory protocols in shared-memory multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
Moving Address Translation Closer to Memory in Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditionally, cache coherence in multiprocessors has been maintained in hardware. However, the cost-effectiveness of hardwired protocols is questionable. Virtual Shared Memory systems have highlighted the many advantages of software-implemented protocols, albeit at a performance price. The performance gap is narrowed by hybrid systems with the addition of hardware support for fine-grain sharing. We have developed a software protocol for a COMA (Cache-Only Memory Architecture). We call the system SC-COMA for Software-Controlled COMA, to emphasize that the protocol engine is emulated by software executed on the main processor. Contrary to user-level protocols, the software handling coherence events in SC-COMA runs in sub-kernel mode, transparently providing the same services to applications as a hardware counterpart. The software emulation layer has been written and we compare SC-COMA to an idealized hardware COMA through detailed simulations. Our results show that SC-COMA is competitive. On systems with 32 processors, it achieves a slowdown of 11-56% with respect to its hardware counterpart, across a range of applications and memory pressures. SC-COMA scales well, up to 32 nodes. A study on the impact of faster processors on SC-COMA's relative performance indicates a consistent improvement, but with a limitation due to the loosely-integrated design. We conclude that SC-COMA is a viable solution to easily transform networks of workstations into powerful multiprocessors.