Exploring Virtual Network Selection Algorithms in DSM Cache Coherence Protocols

Authors:
Mainak Chaudhuri;Mark Heinrich
Affiliations:
-;IEEE
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2004

Citing 23
Cited 3

Deadlock-Free Message Routing in Multiprocessor Interconnection Networks

IEEE Transactions on Computers
The Stanford Dash Multiprocessor

Computer
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The performance impact of flexibility in the Stanford FLASH multiprocessor

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Evaluating virtual channels for cache-coherent shared-memory multiprocessors

ICS '96 Proceedings of the 10th international conference on Supercomputing
Performance benefits of virtual channels and adaptive routing: an application-driven study

ICS '97 Proceedings of the 11th international conference on Supercomputing
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
A Quantitative Analysis of the Performance and Scalability of Distributed Shared Memory Cache Coherence Protocols

IEEE Transactions on Computers - Special issue on cache memory and related problems
Improving the performance of bristled CC-NUMA systems using virtual channels and adaptivity

ICS '99 Proceedings of the 13th international conference on Supercomputing
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Architecture and design of AlphaServer GS320

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
FLASH vs. (Simulated) FLASH: closing the simulation loop

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
A comparative study of arbitration algorithms for the Alpha 21364 pipelined router

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Spider: A High-Speed Network Interconnect

IEEE Micro
The Alpha 21364 Network Architecture

IEEE Micro
Virtual-Channel Flow Control

IEEE Transactions on Parallel and Distributed Systems
Ocean warning: avoid drowning

ACM SIGARCH Computer Architecture News
The performance and scalability of distributed shared-memory cache coherence protocols

The performance and scalability of distributed shared-memory cache coherence protocols
Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation

IEEE Transactions on Computers
The Impact of Negative Acknowledgments in Shared Memory Scientific Applications

IEEE Transactions on Parallel and Distributed Systems

Recursive partitioning multicast: A bandwidth-efficient routing for Networks-on-Chip

NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
The connection-then-credit flow control protocol for heterogeneous multicore systems-on-chip

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems - Special issue on the 2009 ACM/IEEE international symposium on networks-on-chip
Dual partitioning multicasting for high-performance on-chip networks

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed shared memory (DSM) multiprocessors typically require disjoint networks for deadlock-free execution of cache coherence protocols. This is normally achieved by implementing virtual networks with the help of virtual channels or virtual lanes multiplexed on a single physical network. To keep the coherence protocol simple, messages are usually assigned to virtual lanes in a predefined static manner based on a cycle-free lane assignment dependence graph. However, this static split of virtual networks (such as request and reply networks) may lead to underutilization of certain virtual networks while saturating the other networks. In this paper, we explore different static and dynamic schemes to select the virtual lanes for outgoing messages and mix the load among them without restricting any particular type of message to be carried only by a particular virtual network. We achieve this by exposing the selection algorithms to the coherence protocol itself, so that it can inject messages into selected virtual lanes based on some local information, and still enjoy deadlock-freedom. Our execution-driven simulation on five applications from the SPLASH-2 suite shows that as the system scales, the virtual network selection algorithms play an important role. For 128-node systems, our dynamic selection algorithm speeds up parallel execution by as much as 22 percent over an optimized baseline system running a modified SGI Origin 2000 protocol. We also explore how network latency, the number of message buffers per virtual lane, and the depth of network interface output queues affect the relative performance of various virtual lane selection algorithms.