Exploiting Network Locality for CC-NUMA Multiprocessors

Authors:
Hung-Chang Hsiao;Chung-Ta King
Affiliations:
Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan 300, R.O.C.;Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan 300, R.O.C. king@cs.nthu.edu.tw
Venue:
The Journal of Supercomputing
Year:
2001

Citing 29
Cited 0

Performance evaluation of memory consistency models for shared-memory multiprocessors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Experimental comparison of memory management policies for NUMA multiprocessors

ACM Transactions on Computer Systems (TOCS)
The Stanford Dash Multiprocessor

Computer
Comparative performance evaluation of cache-coherent NUMA and COMA architectures

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Fast, contention-free combining tree barriers for shared-memory multiprocessors

International Journal of Parallel Programming
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
S-connect: from networks of workstations to supercomputer performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The GLOW cache coherence protocol extensions for widely shared data

ICS '96 Proceedings of the 10th international conference on Supercomputing
Efficient synchronization: let them eat QOLB

Proceedings of the 24th annual international symposium on Computer architecture
Reactive NUMA: a design for unifying S-COMA and CC-NUMA

Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors

Proceedings of the 25th annual international symposium on Computer architecture
Multicast snooping: a new coherence method using a multicast address network

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Using CSIM to model complex systems

WSC '88 Proceedings of the 20th conference on Winter simulation
Earthquake ground motion modeling on parallel computers

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Interconnection Networks: An Engineering Approach

Interconnection Networks: An Engineering Approach
Deadlock-Free Multicast Wormhole Routing in 2-D Mesh Multicomputers

IEEE Transactions on Parallel and Distributed Systems
ASCOMA: An Adaptive Hybrid Shared Memory Architecture

ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
Multi-address Encoding for Multicast

PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
Multidestination Message Passing Mechanism Conforming to Base Wormhole Routing Scheme

PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
Design Alternatives for Shared Memory Multiprocessors

HIPC '98 Proceedings of the Fifth International Conference on High Performance Computing
An argument for simple COMA

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Reducing Remote Conflict Misses: NUMA with Remote Cache versus COMA

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
The Effectiveness of SRAM Network Caches in Clustered DSMs

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Switch Cache: A Framework for Improving the Remote Memory Access Latency of CC-NUMA Multiprocessors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
An Efficient Implementation of Tree-Based Multicast Routing for Distributed Shared-Memory Multiprocessors

SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Rapid advances in interconnection networks in multiprocessors are closing the gap between computation and communication. Given this trend, how can we utilize fast interconnects? This study proposes an enhanced CC-NUMA architecture, called Depot-NUMA, which views the congregation of the private caches in all nodes as a large remote access cache. Fast interconnects allow a missing block to be fetched from the private caches of other sharing nodes rather than from the home node. Issues involved in designing Depot-NUMA are also discussed, and a novel routing scheme, called multi-hop, is proposed to communicate between the potential sharers and fetch a missing block from their private caches. The sharers are specified based on a stride function to exploit network locality in the system. The proposed Depot-NUMA design requires only modest modification to the node controller and coherence protocol. Additionally, the interconnect fabric can be constructed using existing and unmodified commodity interconnects. Furthermore, the application-driven study reveals that Depot-Numa can reduce the read stall time by up to 41%percnt; and is competitive compared to a CC-NUMA with a large local cache.