Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Tolerating latency through software-controlled data prefetching
Tolerating latency through software-controlled data prefetching
ACM Computing Surveys (CSUR)
Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
Storage allocation for embedded processors
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Compiler-directed scratch pad memory hierarchy design and management
Proceedings of the 39th annual Design Automation Conference
An optimal memory allocation scheme for scratch-pad-based embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Compiler orchestrated prefetching via speculation and predication
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Data partitioning for maximal scratchpad usage
ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
DRDU: A data reuse analysis technique for efficient scratch-pad memory management
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Prefetching irregular references for software cache on cell
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
SPM management using Markov chain based data access prediction
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Compiler-directed scratchpad memory management via graph coloring
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Scratchpad Memory Management Techniques for Code in Embedded Systems without an MMU
IEEE Transactions on Computers
An energy-efficient adaptive hybrid cache
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
HC-Sim: a fast and exact l1 cache simulator with scratchpad memory co-simulation support
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Optimizing memory hierarchy allocation with loop transformations for high-level synthesis
Proceedings of the 49th Annual Design Automation Conference
Static and dynamic co-optimizations for blocks mapping in hybrid caches
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Integrating software caches with scratch pad memory
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Scratchpad memory (SPM) has been utilized as prefetch buffer in embedded systems and parallel architectures to hide memory access latency. However, the impact of reuse pattern on SPM prefetching has not been fully investigated. In this paper we quantify the impact of reuse on SPM prefetching efficiency and propose a reuse-aware SPM prefetching (RASP) scheme. The average performance and energy improvements are 15.9% and 22.0% over cache prefetching, 12.9% and 31.2% over prefetch-only SPM management, 18.5% and 10% over DRDU [1] with SPM prefetching support.