Cache-tuning-aware scratchpad allocation from binaries

Authors:
Daniel Pereira Volpato;Alexandre Keunecke Ignácio Mendonca;José Luís Almada Güntzel;Luiz Cláudio Villar dos Santos
Affiliations:
Federal University of Santa Catarina, Florianopolis, Brazil;Federal University of Santa Catarina, Florianopolis, Brazil;Federal University of Santa Catarina, Florianopolis, Brazil;Federal University of Santa Catarina, Florianopolis, Brazil
Venue:
Proceedings of the 24th symposium on Integrated circuits and systems design
Year:
2011

Citing 20
Cited 0

Dynamic management of scratch-pad memory space

Proceedings of the 38th annual Design Automation Conference
Reducing energy consumption by dynamic copying of instructions onto onchip memory

Proceedings of the 15th international symposium on System Synthesis
An optimal memory allocation scheme for scratch-pad-based embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Cache Configuration Exploration on Prototyping Platforms

RSP '03 Proceedings of the 14th IEEE International Workshop on Rapid System Prototyping (RSP'03)
The M"CORE(TM) M340 Unified Cache Architecture

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Assigning Program and Data Objects to Scratchpad for Energy Reduction

Proceedings of the conference on Design, automation and test in Europe
Dynamic overlay of scratchpad memory for energy minimization

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A post-compiler approach to scratchpad mapping of code

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Hardware/software managed scratchpad memory for embedded system

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Dynamic allocation for scratch-pad memory using compile-time decisions

ACM Transactions on Embedded Computing Systems (TECS)
Dynamic data scratchpad memory management for a memory subsystem with an MMU

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Using DaVinci Technology for Digital Video Devices

Computer
Dynamic scratchpad memory management for code in portable systems with an MMU

ACM Transactions on Embedded Computing Systems (TECS)
A table-based method for single-pass cache optimization

Proceedings of the 18th ACM Great Lakes symposium on VLSI
A software solution for dynamic stack management on scratch pad memory

Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Mapping Data and Code into Scratchpads from Relocatable Binaries

ISVLSI '09 Proceedings of the 2009 IEEE Computer Society Annual Symposium on VLSI
A Novel Adaptive Scratchpad Memory Management Strategy

RTCSA '09 Proceedings of the 2009 15th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications
A Post-compiling Approach that Exploits Code Granularity in Scratchpads to Improve Energy Efficiency

ISVLSI '10 Proceedings of the 2010 IEEE Annual Symposium on VLSI
Scratchpad Memory Management Techniques for Code in Embedded Systems without an MMU

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

The literature on scratchpad memories (SPMs) seems to indicate that the use of dynamic overlaying supersedes static, non-overlay-based (NOB) allocation. Although overlay-based (OVB) techniques operating at source-level code might benefit from multiple hot spots for higher energy savings, they cannot exploit libraries. When operating on binaries, OVB approaches lead to smaller savings, often require dedicated hardware, and sometimes prevent data allocation. Besides, all saving reports published so far ignore that, in cache-based systems, caches are likely to be optimized prior to SPM allocation. We show experimental evidence that, when handling binaries, NOB memory savings (15% to 33% on average) are as good as or better than OVB's. Since our savings (as opposed to related work) were measured after cache tuning -- when there is less room for optimization, our results encourage the use of simpler NOB methods to build library aware allocators that cannot depend on dedicated hardware. We also show that, given the capacity Ct of the equivalent pretuned cache, the optimal SPM size lies in [Ct/2, Ct] for 85% of the evaluated programs. Finally, we show counter-intuitive evidence that, even for cache-based architectures containing small SPMs, procedures should be preferred for allocation instead of basic blocks.