An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Analysis of cache invalidation patterns in multiprocessors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Experimental comparison of memory management policies for NUMA multiprocessors
ACM Transactions on Computer Systems (TOCS)
The Stanford Dash Multiprocessor
Computer
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The performance impact of flexibility in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fast, contention-free combining tree barriers for shared-memory multiprocessors
International Journal of Parallel Programming
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
S-connect: from networks of workstations to supercomputer performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Zero-cycle loads: microarchitecture support for reducing load latency
Proceedings of the 28th annual international symposium on Microarchitecture
Decoupled hardware support for distributed shared memory
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Performance benefits of virtual channels and adaptive routing: an application-driven study
ICS '97 Proceedings of the 11th international conference on Supercomputing
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Designing Tree-Based Barrier Synchronization on 2D Mesh Networks
IEEE Transactions on Parallel and Distributed Systems
Using CSIM to model complex systems
WSC '88 Proceedings of the 20th conference on Winter simulation
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Earthquake ground motion modeling on parallel computers
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Interconnection Networks: An Engineering Approach
Interconnection Networks: An Engineering Approach
Scalable Shared-Memory Multiprocessing
Scalable Shared-Memory Multiprocessing
Multicast Communication in Multicomputer Networks
IEEE Transactions on Parallel and Distributed Systems
How Much Does Network Contention Affect Distributed Shared Memory Performance?
ICPP '97 Proceedings of the international Conference on Parallel Processing
Impact of Adaptivity on the Behaviour of Networks of Workstations under Bursty Traffic
ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
Multidestination Message Passing Mechanism Conforming to Base Wormhole Routing Scheme
PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
Turn grouping for efficient multicast in wormhole mesh networks
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Origin 2000 Design Enhancements for Communication Intensive Applications
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors
The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors
Hi-index | 0.00 |
In distributed shared-memory (DSM) multiprocessors, a write operation requires multiple messages to invalidate the nodes which share and cache the memory block to being written. The consequent write stall time impedes the performance of such systems. An effective means of achieving efficient invalidation is to employ multicast messages to reach the sharing nodes. This study evaluates two multicast-based invalidation schemes, dual-path and pruning, by performing application-driven simulation. The experimental settings used herein find that multicasts improve invalidation traffic for four of the six evaluated real applications. The remaining two applications are computationally intensive, and multicast-based invalidation is less effective. However, since multicasts encourage bursty communication, our results indicate that they help relieve network congestion during these periods. Dual-path performs slightly better than pruning, because it is less sensitive to routing delay in the routers. Our results further demonstrate that cache size is an important design parameter for multicast-based invalidation, and is highly effective for DSM multiprocessors with larger caches.