Introduction to algorithms
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Dynamic storage allocation in the Atlas computer, including an automatic use of a backing store
Communications of the ACM
Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
Compiler-directed scratch pad memory hierarchy design and management
Proceedings of the 39th annual Design Automation Conference
Reducing energy consumption by dynamic copying of instructions onto onchip memory
Proceedings of the 15th international symposium on System Synthesis
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications
EDTC '97 Proceedings of the 1997 European conference on Design and Test
The working set model for program behavior
SOSP '67 Proceedings of the first ACM symposium on Operating System Principles
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Polynomial-time algorithm for on-chip scratchpad memory partitioning
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Cache-Aware Scratchpad Allocation Algorithm
Proceedings of the conference on Design, automation and test in Europe - Volume 2
An integrated hardware/software approach for run-time scratchpad management
Proceedings of the 41st annual Design Automation Conference
Compiler-assisted demand paging for embedded systems with flash memory
Proceedings of the 4th ACM international conference on Embedded software
A post-compiler approach to scratchpad mapping of code
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Compilation techniques for energy reduction in horizontally partitioned cache architectures
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Memory Coloring: A Compiler Approach for Scratchpad Memory Management
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A novel instruction scratchpad memory optimization method based on concomitance metric
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
A dynamic code placement technique for scratchpad memory using postpass optimization
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Scratchpad memory management for portable systems with a memory management unit
EMSOFT '06 Proceedings of the 6th ACM & IEEE International conference on Embedded software
Heap data allocation to scratch-pad memory in embedded systems
Journal of Embedded Computing - Cache exploitation in embedded systems
Dynamic data scratchpad memory management for a memory subsystem with an MMU
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Scratchpad memory management in a multitasking environment
EMSOFT '08 Proceedings of the 8th ACM international conference on Embedded software
Scratchpad allocation for concurrent embedded software
ACM Transactions on Programming Languages and Systems (TOPLAS)
B2P2: bounds based procedure placement for instruction TLB power reduction in embedded systems
Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems
Fine-grain dynamic instruction placement for L0 scratch-pad memory
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Optimal WCET-aware code selection for scratchpad memory
EMSOFT '10 Proceedings of the tenth ACM international conference on Embedded software
Dynamic and adaptive SPM management for a multi-task environment
Journal of Systems Architecture: the EUROMICRO Journal
Cache-tuning-aware scratchpad allocation from binaries
Proceedings of the 24th symposium on Integrated circuits and systems design
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Architecture extensions for efficient management of scratch-pad memory
PATMOS'11 Proceedings of the 21st international conference on Integrated circuit and system design: power and timing modeling, optimization, and simulation
Link-time optimization for power efficiency in a tagless instruction cache
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Processor energy characterization for compiler-assisted software energy reduction
Journal of Electrical and Computer Engineering
Proceedings of the 49th Annual Design Automation Conference
WCET-aware data selection and allocation for scratchpad memory
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Embedded RAIDs-on-chip for bus-based chip-multiprocessors
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
In this work, we present a dynamic memory allocation technique for a novel, horizontally partitioned memory subsystem targeting contemporary embedded processors with a memory management unit (MMU). We propose to replace the on-chip instruction cache with a scratchpad memory (SPM) and a small minicache. Serializing the address translation with the actual memory access enables the memory system to access either only the SPM or the minicache. Independent of the SPM size and based solely on profiling information, a postpass optimizer classifies the code of an application binary into a pageable and a cacheable code region. The latter is placed at a fixed location in the external memory and cached by the minicache. The former, the pageable code region, is copied on demand to the SPM before execution. Both the pageable code region and the SPM are logically divided into pages the size of an MMU memory page. Using the MMU's pagefault exception mechanism, a runtime scratchpad memory manager (SPMM) tracks page accesses and copies frequently executed code pages to the SPM before they get executed. In order to minimize the number of page transfers from the external memory to the SPM, good code placement techniques become more important with increasing sizes of the MMU pages. We discuss code-grouping techniques and provide an analysis of the effect of the MMU's page size on execution time, energy consumption, and external memory accesses. We show that by using the data cache as a victim buffer for the SPM, significant energy savings are possible. We evaluate our SPM allocation strategy with fifteen applications, including H.264, MP3, MPEG-4, and PGP. The proposed memory system requires 8&percent; less die are compared to a fully-cached configuration. On average, we achieve a 31&percent; improvement in runtime performance and a 35&percent; reduction in energy consumption with an MMU page size of 256 bytes.