Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A cache coherence approach for large multiprocessor systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Analysis of cache invalidation patterns in multiprocessors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Analysis and Comparison of Cache Coherence Protocols for a Packet-Switched Multiprocessor
IEEE Transactions on Computers
Journal of Parallel and Distributed Computing - Special issue: software tools for parallel programming and visualization
Shared Block Contention in a Cache Coherence Protocol
IEEE Transactions on Computers
Comparison of hardware and software cache coherence schemes
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Simplicity Versus Accuracy in a Model of Cache Coherency Overhead
IEEE Transactions on Computers
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
A performance evaluation of optimal hybrid cache coherency protocols
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
MIPS R4000 user's manual
Adaptive cache coherency for detecting migratory shared data
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Evaluation of release consistent software distributed shared memory on emerging network technology
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Combined performance gains of simple cache protocol extensions
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
IEEE Transactions on Parallel and Distributed Systems
Techniques for reducing consistency-related communication in distributed shared-memory systems
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Boosting the performance of hybrid snooping cache protocols
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
The Prospects for On-Line Hybrid Coherency Protocols on Bus-Based Multiprocessors
The Prospects for On-Line Hybrid Coherency Protocols on Bus-Based Multiprocessors
Mint Tutorial and User Manual
IEEE Transactions on Parallel and Distributed Systems
A Model of a Microprocessor with a Wide Command Word
Cybernetics and Systems Analysis
Hi-index | 14.98 |
In this paper, we introduce new analytical models for predicting the performance of parallel applications under various cache coherence protocol assumptions. The purpose of these models is to determine which protocols are to be used for which data blocks, and, in the case of dynamic protocols, also to determine when to change protocols. Although we focus on tightly-coupled multiprocessor systems, similar models can be derived for loosely-coupled distributed systems, such as networks of workstations.Our models are unique in that they lie between a large body of theoretical models that assume independence and a uniform distribution of memory accesses across processors, and a large body of address-trace oriented models that assume the availability of a precise characterization of interleaving behavior of memory accesses. The former are not very realistic, and the latter are not suitable for compile-time and run-time usage. In contrast, our models enable us to choose different input parameters depending on how the models will be used and depending on the needed accuracy in performance prediction.We present the models and show how the required parameters can be obtained. We assess the accuracy of our models on 15 parallel applications. For these applications, our most complete model predicts performance within a 10 percent margin when compared to a simulation of a sequentially consistent multiprocessor system. As part of this study, we also show the potential advantage of using dynamic hybrid protocols.