Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Cache Invalidation Patterns in Shared-Memory Multiprocessors
IEEE Transactions on Computers
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A performance evaluation of optimal hybrid cache coherency protocols
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The detection and elimination of useless misses in multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Combined performance gains of simple cache protocol extensions
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Essential misses and data traffic in coherence protocols
Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Application and architectural bottlenecks in large scale distributed shared memory machines
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
False Sharing and Spatial Locality in Multiprocessor Caches
IEEE Transactions on Computers
The DASH Prototype: Logic Overhead and Performance
IEEE Transactions on Parallel and Distributed Systems
Categorizing Network Traffic in Update-Based Protocols on Scalable Multiprocessors
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
An Evaluation of Fine-Grain Producer-Initiated Communication in Cache-Coherent Multiprocessors
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
SS '95 Proceedings of the 28th Annual Simulation Symposium
Hi-index | 0.00 |
The different implementations of parallel programming constructs interact heavily with a multiprocessor's coherence protocol and thus may have a significant impact on performance. The form and extent of this interaction have not been established so far however, particularly in the case of update-based coherence protocols. In this paper we study the running time and communication behavior of ticket and MCS spin locks; centralized, dissemination, and tree-based barriers; parallel and sequential reductions; linear broadcasting and producer and consumer-driven logarithmic broadcasting; and centralized and distributed task queues, under pure and competitive update coherence protocols on a scalable multiprocessor; results for a write invalidate protocol are presented mostly for comparison purposes. Our experiments indicate that parallel programming techniques that are well-established for write invalidate protocols, such as MCS locks and task queues, are often inappropriate for update-based protocols. In contrast, techniques such as dissemination and tree barriers achieve superior performance under update-based protocols. Our results also show that update-based protocols sometimes lead to different design decisions than write invalidate protocols. Our main conclusion is that indeed the interaction of the parallel programming constructs with the multiprocessor's coherence protocol has a significant impact on performance. The implementation of these constructs must be carefully matched to the coherence protocol if ideal performance is to be achieved.