Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
Reducing energy consumption by dynamic copying of instructions onto onchip memory
Proceedings of the 15th international symposium on System Synthesis
An optimal memory allocation scheme for scratch-pad-based embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Cache Configuration Exploration on Prototyping Platforms
RSP '03 Proceedings of the 14th IEEE International Workshop on Rapid System Prototyping (RSP'03)
The M"CORE(TM) M340 Unified Cache Architecture
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Assigning Program and Data Objects to Scratchpad for Energy Reduction
Proceedings of the conference on Design, automation and test in Europe
Dynamic overlay of scratchpad memory for energy minimization
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A post-compiler approach to scratchpad mapping of code
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Hardware/software managed scratchpad memory for embedded system
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Dynamic allocation for scratch-pad memory using compile-time decisions
ACM Transactions on Embedded Computing Systems (TECS)
Dynamic data scratchpad memory management for a memory subsystem with an MMU
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Dynamic scratchpad memory management for code in portable systems with an MMU
ACM Transactions on Embedded Computing Systems (TECS)
A table-based method for single-pass cache optimization
Proceedings of the 18th ACM Great Lakes symposium on VLSI
A software solution for dynamic stack management on scratch pad memory
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Mapping Data and Code into Scratchpads from Relocatable Binaries
ISVLSI '09 Proceedings of the 2009 IEEE Computer Society Annual Symposium on VLSI
A Novel Adaptive Scratchpad Memory Management Strategy
RTCSA '09 Proceedings of the 2009 15th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications
A Post-compiling Approach that Exploits Code Granularity in Scratchpads to Improve Energy Efficiency
ISVLSI '10 Proceedings of the 2010 IEEE Annual Symposium on VLSI
Scratchpad Memory Management Techniques for Code in Embedded Systems without an MMU
IEEE Transactions on Computers
Hi-index | 0.00 |
The literature on scratchpad memories (SPMs) seems to indicate that the use of dynamic overlaying supersedes static, non-overlay-based (NOB) allocation. Although overlay-based (OVB) techniques operating at source-level code might benefit from multiple hot spots for higher energy savings, they cannot exploit libraries. When operating on binaries, OVB approaches lead to smaller savings, often require dedicated hardware, and sometimes prevent data allocation. Besides, all saving reports published so far ignore that, in cache-based systems, caches are likely to be optimized prior to SPM allocation. We show experimental evidence that, when handling binaries, NOB memory savings (15% to 33% on average) are as good as or better than OVB's. Since our savings (as opposed to related work) were measured after cache tuning -- when there is less room for optimization, our results encourage the use of simpler NOB methods to build library aware allocators that cannot depend on dedicated hardware. We also show that, given the capacity Ct of the equivalent pretuned cache, the optimal SPM size lies in [Ct/2, Ct] for 85% of the evaluated programs. Finally, we show counter-intuitive evidence that, even for cache-based architectures containing small SPMs, procedures should be preferred for allocation instead of basic blocks.