Performance evaluation of memory consistency models for shared-memory multiprocessors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Experimental comparison of memory management policies for NUMA multiprocessors
ACM Transactions on Computer Systems (TOCS)
The Stanford Dash Multiprocessor
Computer
Comparative performance evaluation of cache-coherent NUMA and COMA architectures
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Fast, contention-free combining tree barriers for shared-memory multiprocessors
International Journal of Parallel Programming
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
S-connect: from networks of workstations to supercomputer performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The GLOW cache coherence protocol extensions for widely shared data
ICS '96 Proceedings of the 10th international conference on Supercomputing
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
Reactive NUMA: a design for unifying S-COMA and CC-NUMA
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors
Proceedings of the 25th annual international symposium on Computer architecture
Multicast snooping: a new coherence method using a multicast address network
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Using CSIM to model complex systems
WSC '88 Proceedings of the 20th conference on Winter simulation
Earthquake ground motion modeling on parallel computers
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Interconnection Networks: An Engineering Approach
Interconnection Networks: An Engineering Approach
Deadlock-Free Multicast Wormhole Routing in 2-D Mesh Multicomputers
IEEE Transactions on Parallel and Distributed Systems
ASCOMA: An Adaptive Hybrid Shared Memory Architecture
ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
Multi-address Encoding for Multicast
PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
Multidestination Message Passing Mechanism Conforming to Base Wormhole Routing Scheme
PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
Design Alternatives for Shared Memory Multiprocessors
HIPC '98 Proceedings of the Fifth International Conference on High Performance Computing
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Reducing Remote Conflict Misses: NUMA with Remote Cache versus COMA
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
The Effectiveness of SRAM Network Caches in Clustered DSMs
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Switch Cache: A Framework for Improving the Remote Memory Access Latency of CC-NUMA Multiprocessors
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
Hi-index | 0.00 |
Rapid advances in interconnection networks in multiprocessors are closing the gap between computation and communication. Given this trend, how can we utilize fast interconnects? This study proposes an enhanced CC-NUMA architecture, called Depot-NUMA, which views the congregation of the private caches in all nodes as a large remote access cache. Fast interconnects allow a missing block to be fetched from the private caches of other sharing nodes rather than from the home node. Issues involved in designing Depot-NUMA are also discussed, and a novel routing scheme, called multi-hop, is proposed to communicate between the potential sharers and fetch a missing block from their private caches. The sharers are specified based on a stride function to exploit network locality in the system. The proposed Depot-NUMA design requires only modest modification to the node controller and coherence protocol. Additionally, the interconnect fabric can be constructed using existing and unmodified commodity interconnects. Furthermore, the application-driven study reveals that Depot-Numa can reduce the read stall time by up to 41%percnt; and is competitive compared to a CC-NUMA with a large local cache.