Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A retargetable compiler for ANSI C
ACM SIGPLAN Notices
A fully associative software-managed cache design
Proceedings of the 27th annual international symposium on Computer architecture
On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
Exploiting scratch-pad memory using Presburger formulas
Proceedings of the 14th international symposium on Systems synthesis
Storage allocation for embedded processors
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Heterogeneous memory management for embedded systems
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Reducing energy consumption by dynamic copying of instructions onto onchip memory
Proceedings of the 15th international symposium on System Synthesis
An optimal memory allocation scheme for scratch-pad-based embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications
EDTC '97 Proceedings of the 1997 European conference on Design and Test
Temporal-Based Procedure Reordering for Improved Instruction Cache Performance
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Assigning Program and Data Objects to Scratchpad for Energy Reduction
Proceedings of the conference on Design, automation and test in Europe
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Cache-Aware Scratchpad Allocation Algorithm
Proceedings of the conference on Design, automation and test in Europe - Volume 2
An integrated hardware/software approach for run-time scratchpad management
Proceedings of the 41st annual Design Automation Conference
Dynamic overlay of scratchpad memory for energy minimization
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Layer Assignment echniques for Low Energy in Multi-Layered Memory Organisations
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Compiler-optimized usage of partitioned memories
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Hardware/software managed scratchpad memory for embedded system
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
A novel instruction scratchpad memory optimization method based on concomitance metric
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
An integrated scratch-pad allocator for affine and non-affine code
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Dynamic allocation for scratch-pad memory using compile-time decisions
ACM Transactions on Embedded Computing Systems (TECS)
Scratchpad memory management for portable systems with a memory management unit
EMSOFT '06 Proceedings of the 6th ACM & IEEE International conference on Embedded software
Heap data allocation to scratch-pad memory in embedded systems
Journal of Embedded Computing - Cache exploitation in embedded systems
A novel technique to use scratch-pad memory for stack management
Proceedings of the conference on Design, automation and test in Europe
SCOPES '07 Proceedingsof the 10th international workshop on Software & compilers for embedded systems
Program restructuring for virtual memory
IBM Systems Journal
Application-driven synthesis of memory-intensive systems-on-chip
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
An efficient profile-based algorithm for scratchpad memory partitioning
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Processor energy characterization for compiler-assisted software energy reduction
Journal of Electrical and Computer Engineering
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Previous researches show that a scratchpad memory device consumed less energy than a cache device with the same capacity. In this article, we locate the scratchpad memory (SPM) in the top level of the memory hierarchy to reduce the energy consumption. To take the advantage of a SPM, we address two issues of utilizing a SPM. First, the program's locality should be improved. The second issue is SPM management. To tackle these two issues, we present a hardware/software framework for dynamically allocating both instructions and data in SPM. The software flow could be divided into three phases: locality improving, locality extraction, and runtime SPM management. Without modifying the original compiler and the source code, we improve the locality of a program. An optimization algorithm is proposed to extract the SPM allocations. At runtime, an SPM management program is employed. In hardware, an address translation logic (ATL) is proposed to reduce the overhead of SPM management. The results show that the proposed framework can reduce energy delay product (EDP) by 63%, on average, when compared with the traditional cache architecture. The reduction in EDP is contributed by properly allocating both instructions and data in SPM. By allocating only instructions in SPM, the EDPs are reduced by 45%, on average. By allocating only data in SPM, the EDPs are reduced by 14%, on average.