Effects of memory latencies on non-blocking processor/cache architectures

Authors:
Koray Öner;Michel Dubois
Affiliations:
-;-
Venue:
ICS '93 Proceedings of the 7th international conference on Supercomputing
Year:
1993

Citing 16
Cited 0

A study of scalar compilation techniques for pipelined supercomputers

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
An architecture for software-controlled data prefetching

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Hiding memory latency using dynamic scheduling in shared-memory multiprocessors

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Software support for speculative loads

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Reducing memory latency via non-blocking and prefetching caches

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Stride directed prefetching in scalar processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
APRIL: a processor architecture for multiprocessing

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Decoupled access/execute computer architectures

ACM Transactions on Computer Systems (TOCS)
The Architecture of Symbolic Computers

The Architecture of Symbolic Computers
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
The effectiveness of caches and data prefetch buffers in large-scale shared memory multiprocessors

The effectiveness of caches and data prefetch buffers in large-scale shared memory multiprocessors
The effectiveness of caches and data prefetch buffers in large-scale shared memory multiprocessors

The effectiveness of caches and data prefetch buffers in large-scale shared memory multiprocessors
Software methods for improvement of cache performance on supercomputer applications

Software methods for improvement of cache performance on supercomputer applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we introduce a simple hardware mechanism supporting non-blocking loads in conjunction with lockup-free caches to hide memory latencies in high-performance processors. The cache and processor cooperate on load misses so that the overall complexity of the non-blocking mechanisms in the cache and in the processor is greatly reduced. We use detailed simulations to evaluate the effectiveness of the architecture and of a simple compiler transformation at hiding miss latencies of up to 200 processor cycles. For a given program we identify a critical latency. For latencies lower than this critical latency, the non-blocking processor/cache architecture achieves perfect memory latency tolerance by overlapping misses with processor execution. For higher latencies, significant improvements in processor efficiency are still obtained by overlapping multiple misses together. A simple model is used to illustrate this effect and improvements are proposed based on the results.