A reuse-aware prefetching scheme for scratchpad memory

Authors:
Jason Cong;Hui Huang;Chunyue Liu;Yi Zou
Affiliations:
University of California, Los Angeles, Los Angeles, CA;University of California, Los Angeles, Los Angeles, CA;University of California, Los Angeles, Los Angeles, CA;University of California, Los Angeles, Los Angeles, CA
Venue:
Proceedings of the 48th Design Automation Conference
Year:
2011

Citing 21
Cited 7

Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Tolerating latency through software-controlled data prefetching

Tolerating latency through software-controlled data prefetching
Data prefetch mechanisms

ACM Computing Surveys (CSUR)
Dynamic management of scratch-pad memory space

Proceedings of the 38th annual Design Automation Conference
Storage allocation for embedded processors

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Compiler-directed scratch pad memory hierarchy design and management

Proceedings of the 39th annual Design Automation Conference
An optimal memory allocation scheme for scratch-pad-based embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Simics: A Full System Simulation Platform

Computer
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Compiler orchestrated prefetching via speculation and predication

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Data partitioning for maximal scratchpad usage

ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
DRDU: A data reuse analysis technique for efficient scratch-pad memory management

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Prefetching irregular references for software cache on cell

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
SPM management using Markov chain based data access prediction

Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Compiler-directed scratchpad memory management via graph coloring

ACM Transactions on Architecture and Code Optimization (TACO)
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Scratchpad Memory Management Techniques for Code in Embedded Systems without an MMU

IEEE Transactions on Computers

An energy-efficient adaptive hybrid cache

Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
HC-Sim: a fast and exact l1 cache simulator with scratchpad memory co-simulation support

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Optimizing memory hierarchy allocation with loop transformations for high-level synthesis

Proceedings of the 49th Annual Design Automation Conference
Static and dynamic co-optimizations for blocks mapping in hybrid caches

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Integrating software caches with scratch pad memory

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA

Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scratchpad memory (SPM) has been utilized as prefetch buffer in embedded systems and parallel architectures to hide memory access latency. However, the impact of reuse pattern on SPM prefetching has not been fully investigated. In this paper we quantify the impact of reuse on SPM prefetching efficiency and propose a reuse-aware SPM prefetching (RASP) scheme. The average performance and energy improvements are 15.9% and 22.0% over cache prefetching, 12.9% and 31.2% over prefetch-only SPM management, 18.5% and 10% over DRDU [1] with SPM prefetching support.