Memory access buffering in multiprocessors
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Multiprocessor cache design considerations
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
A memory management unit and cache controller for the MARS system
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Improving Memory Utilization in Cache Coherence Directories
IEEE Transactions on Parallel and Distributed Systems
The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor
IEEE Transactions on Parallel and Distributed Systems
Scalable Cache Miss Handling for High Memory-Level Parallelism
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
High-performance multiprocessors must incorporate a high-bandwidth, short-latency memory aggregate to support maximal processor utilization. Cache memories are often used to meet this requirement. The performance of cache-based, shared-memory multiprocessors can suffer greatly from moderate cache miss rates because of the usually high ratio between memory-access and cache-access times. In this paper we propose a lockup-free cache design in which the handling of one or several cache misses is overlapped with processor activity. In multiprocessors, lockup-free caches aggravate the memory coherence problem. Three different cache architectures relying on different compiler interventions are introduced. A performance model demonstrates the usefulness of lockup-free caches for high-performance processors. The merits and disadvantages of the three schemes are discussed and compiler techniques, to take advantage of the proposed designs, are illustrated at the end of the paper.