Distributed Prefetch-buffer/Cache Design for High Performance Memory Systems

Authors:
Thomas Alexander;Gershon Kedem
Affiliations:
-;-
Venue:
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Year:
1996

Citing 18
Cited 23

The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The SPARC architecture manual: version 8

The SPARC architecture manual: version 8
An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Breaking the barrier of parallel simulation of digital systems

DAC '91 Proceedings of the 28th ACM/IEEE Design Automation Conference
Efficient trace-driven simulation methods for cache performance analysis

ACM Transactions on Computer Systems (TOCS)
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Stride directed prefetching in scalar processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Memory in the fast lane

IEEE Spectrum
A performance study of software and hardware data prefetching schemes

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Adaptive and integrated data cache prefetching for shared-memory multiprocessors

Adaptive and integrated data cache prefetching for shared-memory multiprocessors
Trace-driven simulations for a two-level cache design in open bus systems

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Generation and analysis of very long address traces

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache Memories

ACM Computing Surveys (CSUR)
Locality Optimizations for Parallel Machines

CONPAR 94 - VAPP VI Proceedings of the Third Joint International Conference on Vector and Parallel Processing: Parallel Processing
Caching and Lemmaizing in Model Elimination Theorem Provers

CADE-11 Proceedings of the 11th International Conference on Automated Deduction: Automated Deduction
How Useful Are Non-Blocking Loads, Stream Buffers and Speculative Execution in Multiple Issue Processors?

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
A distributed predictive cache for high performance computer systems

A distributed predictive cache for high performance computer systems

Prefetching using Markov predictors

Proceedings of the 24th annual international symposium on Computer architecture
Prefetching Using Markov Predictors

IEEE Transactions on Computers - Special issue on cache memory and related problems
Push vs. pull: data movement for linked data structures

Proceedings of the 14th international conference on Supercomputing
Data prefetch mechanisms

ACM Computing Surveys (CSUR)
Predictor-directed stream buffers

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Using a user-level memory thread for correlation prefetching

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A Decoupled Predictor-Directed Stream Prefetching Architecture

IEEE Transactions on Computers
A Programmable Memory Hierarchy for Prefetching Linked Data Structures

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Content-Based Prefetching: Initial Results

IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
DRAM-Page Based Prediction and Prefetching

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Guided region prefetching: a cooperative hardware/software approach

Proceedings of the 30th annual international symposium on Computer architecture
Correlation Prefetching with a User-Level Memory Thread

IEEE Transactions on Parallel and Distributed Systems
Tolerating memory latency through push prefetching for pointer-intensive applications

ACM Transactions on Architecture and Code Optimization (TACO)
Memory predecryption: hiding the latency overhead of memory encryption

ACM SIGARCH Computer Architecture News - Special issue: Workshop on architectural support for security and anti-virus (WASSA)
Memory-side prefetching for linked data structures for processor-in-memory systems

Journal of Parallel and Distributed Computing
Victim management in a cache hierarchy

IBM Journal of Research and Development - Advanced silicon technology
Memory Prefetching Using Adaptive Stream Detection

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Impulse: Memory system support for scientific applications

Scientific Programming
Exploiting program cyclic behavior to reduce memory latency in embedded processors

Proceedings of the 2008 ACM symposium on Applied computing
Server-based data push architecture for multi-processor environments

Journal of Computer Science and Technology
Designing packet buffers for router linecards

IEEE/ACM Transactions on Networking (TON)
Algorithmic techniques for memory energy reduction

WEA'03 Proceedings of the 2nd international conference on Experimental and efficient algorithms
Helper thread prefetching for loosely-coupled multiprocessor systems

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Microprocessor execution speeds are improving at a rate of 50%-80% per year while DRAM access times are improving at a much lower rate of 5%-10% a year. Computer systems are rapidly approaching the point at which overall system performance is determined not by the speed of the CPU but by the memory system speed. We present a high performance memory system architecture that overcomes the growing speed disparity between high performance microprocessors and current generation DRAMs. A novel prediction and prefetching technique is combined with a distributed cache architecture to build a high performance memory system. We use a table driven prediction and a prediction cache to prefetch data from the on-chip DRAM array to an on-chip SRAM prefetch buffer. By prefetching data we are able to hide the large latency associated with DRAM access and cycle times. Our experiments show that with a small (32 KB) prediction cache we can get an effective main memory access time that is close to the access time of larger secondary caches.