Evaluating the Effect of Coherence Protocols on the Performance of Parallel Programming Constructs

Authors:
Ricardo Bianchini;Enrique V. Carrera;Leonidas Kontothanassis
Affiliations:
-;-;-
Venue:
International Journal of Parallel Programming
Year:
1998

Citing 21
Cited 0

Cache coherence protocols: evaluation using a multiprocessor simulation model

ACM Transactions on Computer Systems (TOCS)
A characterization of sharing in parallel programs and its application to coherency protocol evaluation

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Cache Invalidation Patterns in Shared-Memory Multiprocessors

IEEE Transactions on Computers
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A performance evaluation of optimal hybrid cache coherency protocols

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The detection and elimination of useless misses in multiprocessors

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Combined performance gains of simple cache protocol extensions

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Essential misses and data traffic in coherence protocols

Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Application and architectural bottlenecks in large scale distributed shared memory machines

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
False Sharing and Spatial Locality in Multiprocessor Caches

IEEE Transactions on Computers
The DASH Prototype: Logic Overhead and Performance

IEEE Transactions on Parallel and Distributed Systems
Categorizing Network Traffic in Update-Based Protocols on Scalable Multiprocessors

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors

MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
An Evaluation of Fine-Grain Producer-Initiated Communication in Cache-Coherent Multiprocessors

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Algorithms for categorizing multiprocessor communication under invalidate and update-based coherence protocols

SS '95 Proceedings of the 28th Annual Simulation Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

The different implementations of parallel programming constructs interact heavily with a multiprocessor's coherence protocol and thus may have a significant impact on performance. The form and extent of this interaction have not been established so far however, particularly in the case of update-based coherence protocols. In this paper we study the running time and communication behavior of ticket and MCS spin locks; centralized, dissemination, and tree-based barriers; parallel and sequential reductions; linear broadcasting and producer and consumer-driven logarithmic broadcasting; and centralized and distributed task queues, under pure and competitive update coherence protocols on a scalable multiprocessor; results for a write invalidate protocol are presented mostly for comparison purposes. Our experiments indicate that parallel programming techniques that are well-established for write invalidate protocols, such as MCS locks and task queues, are often inappropriate for update-based protocols. In contrast, techniques such as dissemination and tree barriers achieve superior performance under update-based protocols. Our results also show that update-based protocols sometimes lead to different design decisions than write invalidate protocols. Our main conclusion is that indeed the interaction of the parallel programming constructs with the multiprocessor's coherence protocol has a significant impact on performance. The implementation of these constructs must be carefully matched to the coherence protocol if ideal performance is to be achieved.