Cache evaluation and the impact of workload choice
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Cache Performance in the VAX-11/780
ACM Transactions on Computer Systems (TOCS)
Proceedings of the Tutorial and Workshop on Category Theory and Computer Programming
Using cache memory to reduce processor-memory traffic
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
A study of instruction cache organizations and replacement policies
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Experimental evaluation of on-chip microprocessor cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Dynamic decentralized cache schemes for mimd parallel processors
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
A low-overhead coherence solution for multiprocessors with private cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
An economical solution to the cache coherence problem
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Compiler memory management and compound function definition for multiprocessors
Compiler memory management and compound function definition for multiprocessors
A cache coherence scheme with fast selective invalidation
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The VMP multiprocessor: initial experience, refinements, and performance evaluation
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The design of a lockup-free cache for high-performance multiprocessors
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
A software coherence scheme with the assistance of directories
ICS '91 Proceedings of the 5th international conference on Supercomputing
Data prefetching in multiprocessor vector cache memories
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Delayed consistency and its effects on the miss rate of parallel programs
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Comparison and analysis of software and directory coherence schemes
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Design choices for the TOP-1 multiprocessor workstation
IBM Journal of Research and Development
Life span strategy—a compiler-based approach to cache coherence
ICS '92 Proceedings of the 6th international conference on Supercomputing
An effective write policy for software coherence schemes
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons
ACM Computing Surveys (CSUR)
Cache inclusion and processor sampling in multiprocessor simulations
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A version control approach to Cache coherence
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Classifying Software-Based Cache Coherence Solutions
IEEE Software
False Sharing and Spatial Locality in Multiprocessor Caches
IEEE Transactions on Computers
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps
IEEE Transactions on Parallel and Distributed Systems
Design of an Adaptive Cache Coherence Protocol for Large Scale Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Improving Memory Utilization in Cache Coherence Directories
IEEE Transactions on Parallel and Distributed Systems
The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor
IEEE Transactions on Parallel and Distributed Systems
Exploiting locality to ameliorate packet queue contention and serialization
Proceedings of the 3rd conference on Computing frontiers
Can High Bandwidth and Latency Justify Large Cache Blocks in Scalable Multiprocessors?
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Lazy cache invalidation for self-modifying codes
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Hi-index | 0.01 |
In this paper, cache design is explored for large high-performance multiprocessors with hundreds or thousands of processors and memory modules interconnected by a pipe-lined multi-stage network. The majority of the multiprocessor cache studies in the literature exclusively focus on the issue of cache coherence enforcement. However, there are other characteristics unique to such multiprocessors which create an environment for cache performance that is very different from that of many uniprocessors.Multiprocessor conditions are identified and modeled, including, 1) the cost of a cache coherence enforcement scheme, 2) the effect of a high degree of overlap between cache miss services, 3) the cost of a pin limited data path between shared memory and caches, 4) the effect of a high degree of data prefetching, 5) the program behavior of a scientific workload as represented by 23 numerical subroutines, and 6) the parallel execution of programs. This model is used to show that the cache miss ratio is not a suitable performance measure in the multiprocessors of interest and to show that the optimal cache block size in such multiprocessors is much smaller than in many uniprocessors.