Performance Analysis of Four Memory Consistency Models for Multithreaded Multiprocessors

Authors:
Yong-Kim Chong;Kai Hwang
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1995

Citing 21
Cited 1

A class of generalized stochastic Petri nets for the performance evaluation of multiprocessor systems

ACM Transactions on Computer Systems (TOCS)
Memory access buffering in multiprocessors

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Memory coherence in shared virtual memory systems

ACM Transactions on Computer Systems (TOCS)
Memory Access Dependencies in Shared-Memory Multiprocessors

IEEE Transactions on Software Engineering
Analysis of multithreaded architectures for parallel computing

SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Performance evaluation of memory consistency models for shared-memory multiprocessors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors

Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
Distributed Shared Memory: A Survey of Issues and Algorithms

Computer - Distributed computing systems: separate resources acting as one
Comparative evaluation of latency reducing and tolerating techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The Stanford Dash Multiprocessor

Computer
Ultracomputers: a teraflop before its time

Communications of the ACM
A performance study of memory consistency models

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Waiting algorithms for synchronization in large-scale multiprocessors

ACM Transactions on Computer Systems (TOCS)
Performance and optimization of data prefetching strategies in scalable multiprocessors

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Memory consistency and event ordering in scalable shared-memory multiprocessors

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Advanced Computer Architecture: Parallelism,Scalability,Programmability

Advanced Computer Architecture: Parallelism,Scalability,Programmability
Performance Tradeoffs in Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
A Unified Formalization of Four Shared-Memory Models

IEEE Transactions on Parallel and Distributed Systems
An Analytical Solution for a Markov Chain Modeling Multithreaded

An Analytical Solution for a Markov Chain Modeling Multithreaded

Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia Applications

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stochastic timed Petri nets are developed to evaluate the relative performance of distributed shared memory models for scalable multiprocessors, using multithreaded processors as building blocks. Four shared memory models are evaluated: the Sequential Consistency (SC) model by Lamport (1979), the Weak Consistency (WC) model by Dubois et al. (1986), the Processor Consistency (PC) model by Goodman (1989), and the Release Consistency (RC) model by Gharachorloo et al. (1990). We assumed a scalable network with a sufficient bandwidth to absorb the increased traffic from multithreading, coherent caches, and memory event reordering. The embedded Markov chains are solved to reveal the performance attributes. Under saturated conditions, we find that multithreading contributes more than 50% of the performance improvement, while the improvement from memory consistency models varies between 20% to 40% of the total performance gain. Petri net models are effective to predict the performance of processors with a larger number of contexts than that can be simulated in previous benchmark studies. The accuracy of these memory performance models was validated with the simulation results from Stanford University. Our analytical results reveal the lowest performance of the SC model amongst four memory consistency models. The PC model requires to use larger write buffers, while the WC and RC models require smaller write buffers. The PC model may perform even lower than the SC model, if a small buffer was used. The performance of the WC model depends heavily on the synchronization rate in user code. For a low synchronization rate, the WC model performs as well as the RC model. With sufficient multithreading and network bandwidth, the RC model shows the best performance among the four models. Furthermore, we discovered that cache interferences cause very little performance degradation in all relaxed memory consistency models; as long as the network is contention-free even when multithreading has saturated the system.