The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors

Authors:
Chris Holt;Mark Heinrich;Jaswinder P Singh;Edward Rothberg;John Hennessy
Affiliations:
-;-;-;-;-
Venue:
The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors
Year:
1995

Citing 0
Cited 23

The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Understanding application performance on shared virtual memory systems

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Application and architectural bottlenecks in large scale distributed shared memory machines

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
A performance evaluation of cluster architectures

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
LoPC: modeling contention in parallel algorithms

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Effects of communication latency, overhead, and bandwidth in a cluster architecture

Proceedings of the 24th annual international symposium on Computer architecture
Coherence controller architectures for SMP-based CC-NUMA multiprocessors

Proceedings of the 24th annual international symposium on Computer architecture
Evaluation of hardware write propagation support for next-generation shared virtual memory clusters

ICS '98 Proceedings of the 12th international conference on Supercomputing
LoGPC: modeling network contention in message-passing programs

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A Quantitative Analysis of the Performance and Scalability of Distributed Shared Memory Cache Coherence Protocols

IEEE Transactions on Computers - Special issue on cache memory and related problems
Coherence Controller Architectures for Scalable Shared-Memory Multiprocessors

IEEE Transactions on Computers - Special issue on cache memory and related problems
Predictive analysis of a wavefront application using LogGP

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Resource Scaling Effects on MPP Performance: The STAP Benchmark Implications

IEEE Transactions on Parallel and Distributed Systems
LoGPC: Modeling Network Contention in Message-Passing Programs

IEEE Transactions on Parallel and Distributed Systems
Optimizing software cache-coherent cluster architectures

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
The effects of communication parameters on end performance of shared virtual memory clusters

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
An Application-Driven Study of Multicast Communication for Write Invalidation

The Journal of Supercomputing
Hardware Versus Software Implementation of COMA

ICPP '97 Proceedings of the international Conference on Parallel Processing
Adaptive Proxies: Handling Widely-Shared Data in Shared-Memory Multiprocessors (Research Note)

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation

IEEE Transactions on Computers
Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs

Journal of Parallel and Distributed Computing
Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications

International Journal of High Performance Computing Applications
A performance model for fine-grain accesses in UPC

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed shared memory (DSM) machines can be characterized by four parameters, based on a slightly modified version of the logP model. The l (latency) and o (occupancy of the communication controller) parameters are the keys to performance in these machines, and are largely determined by major architectural decisions about the aggressiveness and customization of the node and network. For recent and upcoming machines, the g (gap) parameter that measures node-to-network bandwidth does not appear to be a bottleneck. Conventional wisdom is that latency is the dominant factor in determining the performance of a DSM machine. We show, however, that controller occupancy--which causes contention even in highly optimized applications--plays a major role, especially at low latencies. When latency hiding is used, occupancy becomes more critical, even in machines with high latency networks. Scaling the problem size is often used as a technique to overcome limitations in communication latency and bandwidth. We show that in many structured computations occupancy-induced contention is not alleviated by increasing problem size, and that there are important classes of applications for which the performance lost by using higher latency networks or higher occupancy controllers cannot be regained easily, if at all, by scaling the problem size.