The design of a lockup-free cache for high-performance multiprocessors

Authors:
C. Scheurich;M. Dubois
Affiliations:
Department of Electrical Engineering, University of Southern California, Los Angeles, Califonlia;Department of Electrical Engineering, University of Southern California, Los Angeles, Califonlia
Venue:
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Year:
1988

Citing 5
Cited 4

Memory access buffering in multiprocessors

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A Survey of RISC Processors and Computers of the Mid-1980s

Computer
Multiprocessor cache design considerations

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Cache Memories

ACM Computing Surveys (CSUR)
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture

A memory management unit and cache controller for the MARS system

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Improving Memory Utilization in Cache Coherence Directories

IEEE Transactions on Parallel and Distributed Systems
The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor

IEEE Transactions on Parallel and Distributed Systems
Scalable Cache Miss Handling for High Memory-Level Parallelism

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-performance multiprocessors must incorporate a high-bandwidth, short-latency memory aggregate to support maximal processor utilization. Cache memories are often used to meet this requirement. The performance of cache-based, shared-memory multiprocessors can suffer greatly from moderate cache miss rates because of the usually high ratio between memory-access and cache-access times. In this paper we propose a lockup-free cache design in which the handling of one or several cache misses is overlapped with processor activity. In multiprocessors, lockup-free caches aggravate the memory coherence problem. Three different cache architectures relying on different compiler interventions are introduced. A performance model demonstrates the usefulness of lockup-free caches for high-performance processors. The merits and disadvantages of the three schemes are discussed and compiler techniques, to take advantage of the proposed designs, are illustrated at the end of the paper.