Correlation Prefetching with a User-Level Memory Thread

Authors:
Yan Solihin;Jaejin Lee;Josep Torrellas
Affiliations:
-;-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2003

Citing 24
Cited 3

Reducing memory latency via non-blocking and prefetching caches

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Evaluating stream buffers as a secondary cache replacement

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Speeding up irregular applications in shared-memory multiprocessors: memory binding and group prefetching

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
SPAID: software prefetching in pointer- and call-intensive environments

Proceedings of the 28th annual international symposium on Microarchitecture
Compiler-based prefetching for recursive data structures

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Prefetching using Markov predictors

Proceedings of the 24th annual international symposium on Computer architecture
Comparing data forwarding and prefetching for communication-induced misses in shared-memory MPs

ICS '98 Proceedings of the 12th international conference on Supercomputing
Dependence based prefetching for linked data structures

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Simultaneous subordinate microthreading (SSMT)

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Push vs. pull: data movement for linked data structures

Proceedings of the 14th international conference on Supercomputing
Predictor-directed stream buffers

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Dead-block prediction & dead-block correlating prefetchers

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Using a user-level memory thread for correlation prefetching

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Timekeeping in the memory system: predicting and optimizing memory behavior

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Scalable Processors in the Billion-Transistor Era: IRAM

Computer
Multi-Chain Prefetching: Effective Exploitation of Inter-Chain Memory Parallelism for Pointer-Chasing Codes

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Distributed Prefetch-buffer/Cache Design for High Performance Memory Systems

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Impulse: Building a Smarter Memory Controller

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
TCP: Tag Correlating Prefetchers

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
An Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Speculative Data-Driven Multithreading

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Automatically Mapping Code on an Intelligent Memory Architecture

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Data prefetch mechanisms for accelerating symbolic and numeric computation

Data prefetch mechanisms for accelerating symbolic and numeric computation

Performance characteristics of MAUI: an intelligent memory system architecture

Proceedings of the 2005 workshop on Memory system performance
The bandwidth expansion effectiveness of cache levels block prefetch

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Meeting midway: improving CMP performance with memory-side prefetching

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes using a User-Level Memory Thread (ULMT) for correlation prefetching. In this approach, a user thread runs on a general-purpose processor in main memory, either in the memory controller chip or in a DRAM chip. The thread performs correlation prefetching in software, sending the prefetched data into the L2 cache of the main processor. This approach requires minimal hardware beyond the memory processor: The correlation table is a software data structure that resides in main memory, while the main processor only needs a few modifications to its L2 cache so that it can accept incoming prefetches. In addition, the approach has wide applicability, as it can effectively prefetch even for irregular applications. Finally, it is very flexible, as the prefetching algorithm can be customized by the user on an application basis. Our simulation results show that, through a new design of the correlation table and prefetching algorithm, our scheme delivers good results. Specifically, nine mostly-irregular applications show an average speedup of 1.32. Furthermore, our scheme works well in combination with a conventional processor-side sequential prefetcher, in which case the average speedup increases to 1.46. Finally, by exploiting the customization of the prefetching algorithm, we increase the average speedup to 1.53.