A comparative evaluation of hybrid distributed shared-memory systems

Authors:
Adrian Moga;Michel Dubois
Affiliations:
Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089-2562, United States and Intel, 2111 NE 25th Ave., JF5-256, Hillsboro, OR 97124, United States;Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089-2562, United States
Venue:
Journal of Systems Architecture: the EUROMICRO Journal
Year:
2009

Citing 28
Cited 1

The x-kernel: A Platform for Accessing Internet Resources

Computer
Comparative performance evaluation of cache-coherent NUMA and COMA architectures

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
DDM: A Cache-Only Memory Architecture

Computer
Closing the window of vulnerability in multiphase memory transactions

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Evaluation of release consistent software distributed shared memory on emerging network technology

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Virtual memory mapped network interface for the SHRIMP multicomputer

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Software-extended coherent shared memory: performance and cost

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Fine-grain access control for distributed shared memory

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
COMA-F: a non-hierarchical cache only memory architecture

COMA-F: a non-hierarchical cache only memory architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Efficient strategies for software-only protocols in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Reactive NUMA: a design for unifying S-COMA and CC-NUMA

Proceedings of the 24th annual international symposium on Computer architecture
Tolerating late memory traps in ILP processors

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Power: A First-Class Architectural Design Constraint

Computer
Software cache coherence for large scale multiprocessors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
An argument for simple COMA

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Reducing Remote Conflict Misses: NUMA with Remote Cache versus COMA

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation

IEEE Transactions on Computers
Reflections on the memory wall

Proceedings of the 1st conference on Computing frontiers
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Can high performance software DSM systems designed with InfiniBand features benefit from PCI-Express?

CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
IBM Power5 Chip: A Dual-Core Multithreaded Processor

IEEE Micro
Comparing latency-tolerance techniques for software DSM systems

IEEE Transactions on Parallel and Distributed Systems

Specification-based Verification in a Distributed Shared Memory Simulation Model

Simulation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed Shared-Memory (DSM) systems are shared-memory multiprocessor architectures in which each processor node contains a partition of the shared memory. In hybrid DSM systems coherence among caches is maintained by a software-implemented coherence protocol relying on some hardware support. Hardware support is provided to satisfy every node hit (the common case) and software is invoked only for accesses to remote nodes. In this paper we compare the design and performance of four hybrid distributed shared memory (DSM) organizations by detailed simulation of the same hardware platform. We have implemented the software protocol handlers for the four architectures. The handlers are written in C and assembly code. Coherence transactions are executed in trap and interrupt handlers. Together with the application, the handlers are executed in full detail in execution-driven simulations of six complete benchmarks with coarse-grain and fine-grain sharing. We relate our experience implementing and simulating the software protocols for the four architectures. Because the overhead of remote accesses is very high in hybrid systems, the system of choice is different than for purely hardware systems.