Power analysis and minimization techniques for embedded DSP software
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
A fully associative software-managed cache design
Proceedings of the 27th annual international symposium on Computer architecture
On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Optimal spilling for CISC machines with few registers
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Storage allocation for embedded processors
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler Support for Scalable and Efficient Memory Systems
IEEE Transactions on Computers
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Cache-Aware Scratchpad Allocation Algorithm
Proceedings of the conference on Design, automation and test in Europe - Volume 2
Measuring the cache interference cost in preemptive real-time systems
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Adaptive code unloading for resource-constrained JVMs
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
EMBARC: an efficient memory bank assignment algorithm for retargetable compilers
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Fast, predictable and low energy memory references through architecture-aware compilation
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Dynamic overlay of scratchpad memory for energy minimization
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Compiler-assisted demand paging for embedded systems with flash memory
Proceedings of the 4th ACM international conference on Embedded software
Cluster miss prediction with prefetch on miss for embedded CPU instruction caches
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Extended Control Flow Graph Based Performance Optimization Using Scratch-Pad Memory
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Distributed Data Cache Designs for Clustered VLIW Processors
IEEE Transactions on Computers
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Memory Coloring: A Compiler Approach for Scratchpad Memory Management
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Hardware/software managed scratchpad memory for embedded system
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
A novel instruction scratchpad memory optimization method based on concomitance metric
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Analysis of scratch-pad and data-cache performance using statistical methods
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Shared Scratch-Pad Memory Space Management
ISQED '06 Proceedings of the 7th International Symposium on Quality Electronic Design
An integrated scratch-pad allocator for affine and non-affine code
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Dynamic code overlay of SDF-modeled programs on low-end embedded systems
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Dynamic allocation for scratch-pad memory using compile-time decisions
ACM Transactions on Embedded Computing Systems (TECS)
Performance analysis of binary code protection
WSC '05 Proceedings of the 37th conference on Winter simulation
A dynamic code placement technique for scratchpad memory using postpass optimization
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Integrated scratchpad memory optimization and task scheduling for MPSoC architectures
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
DRDU: A data reuse analysis technique for efficient scratch-pad memory management
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Heap data allocation to scratch-pad memory in embedded systems
Journal of Embedded Computing - Cache exploitation in embedded systems
Fast, accurate design space exploration of embedded systems memory configurations
Proceedings of the 2007 ACM symposium on Applied computing
Compiler-Directed Variable Latency Aware SPM Management to CopeWith Timing Problems
Proceedings of the International Symposium on Code Generation and Optimization
Dynamic data scratchpad memory management for a memory subsystem with an MMU
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Scratchpad allocation for data aggregates in superperfect graphs
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A novel technique to use scratch-pad memory for stack management
Proceedings of the conference on Design, automation and test in Europe
Reducing off-chip memory access costs using data recomputation in embedded chip multi-processors
Proceedings of the 44th annual Design Automation Conference
The case for the precision timed (PRET) machine
Proceedings of the 44th annual Design Automation Conference
Recursive function data allocation to scratch-pad memory
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Scratch-pad memory allocation without compiler support for java applications
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Software controlled memory layout reorganization for irregular array access patterns
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Program optimization space pruning for a multithreaded gpu
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Hybrid solid-state disks: combining heterogeneous NAND flash in large SSDs
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Efficient dynamic heap allocation of scratch-pad memory
Proceedings of the 7th international symposium on Memory management
Compiler driven data layout optimization for regular/irregular array access patterns
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Access pattern-based code compression for memory-constrained systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Predictable programming on a precision timed architecture
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
A Framework for Task Scheduling and Memory Partitioning for Multi-Processor System-on-Chip
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size
ACM Transactions on Embedded Computing Systems (TECS)
A software solution for dynamic stack management on scratch pad memory
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Software transactional memory for multicore embedded systems
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Compiler-directed scratchpad memory management via graph coloring
ACM Transactions on Architecture and Code Optimization (TACO)
SPMTM: A Novel ScratchPad Memory Based Hybrid Nested Transactional Memory Framework
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Performance balancing: software-based on-chip memory management for effective CMP executions
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Instruction cache locking inside a binary rewriter
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Adaptive scratch pad memory management for dynamic behavior of multimedia applications
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A software-only solution to use scratch pads for stack data
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Scratchpad allocation for concurrent embedded software
ACM Transactions on Programming Languages and Systems (TOPLAS)
A hardware/software framework for instruction and data scratchpad memory allocation
ACM Transactions on Architecture and Code Optimization (TACO)
Reinventing computing for real time
Proceedings of the 12th Monterey conference on Reliable systems on unreliable networked platforms
ECOOP'07 Proceedings of the 2007 conference on Object-oriented technology
Implementation, compilation, optimization of object-oriented languages, programs and systems
ECOOP'06 Proceedings of the 2006 conference on Object-oriented technology: ECOOP 2006 workshop reader
SDRM: simultaneous determination of regions and function-to-region mapping for scratchpad memories
HiPC'08 Proceedings of the 15th international conference on High performance computing
A framework for automatic parallelization, static and dynamic memory optimization in MPSoC platforms
Proceedings of the 47th Design Automation Conference
Partitioning and allocation of scratch-pad memory for priority-based preemptive multi-task systems
Proceedings of the Conference on Design, Automation and Test in Europe
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Heap data management for limited local memory (LLM) multi-core processors
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Scratchpad memory allocation for data aggregates via interval coloring in superperfect graphs
ACM Transactions on Embedded Computing Systems (TECS)
Co-optimization of memory access and task scheduling on MPSoC architectures with multi-level memory
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
Dynamic and adaptive SPM management for a multi-task environment
Journal of Systems Architecture: the EUROMICRO Journal
VEGAS: soft vector processor with scratchpad memory
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Overlay techniques for scratchpad memories in low power embedded processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Exploiting time predictable two-level scratchpad memory for real-time systems
Proceedings of the 2011 ACM Symposium on Applied Computing
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Cache-tuning-aware scratchpad allocation from binaries
Proceedings of the 24th symposium on Integrated circuits and systems design
A reuse-aware prefetching scheme for scratchpad memory
Proceedings of the 48th Design Automation Conference
FELI: HW/SW support for on-chip distributed shared memory in multicores
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Demand Paging Techniques for Flash Memory Using Compiler Post-Pass Optimizations
ACM Transactions on Embedded Computing Systems (TECS)
DynaPoMP: dynamic policy-driven memory protection for SPM-based embedded systems
WESS '11 Proceedings of the Workshop on Embedded Systems Security
Power efficient instruction caches for embedded systems
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Optimizing local memory allocation and assignment through a decoupled approach
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Efficient scratchpad allocation algorithms for energy constrained embedded systems
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
A decoupled local memory allocator
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Data cache organization for accurate timing analysis
Real-Time Systems
Towards a performance- and energy-efficient data filter cache
Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems
A software-only scheme for managing heap data on limited local memory(LLM) multicore processors
ACM Transactions on Embedded Computing Systems (TECS)
Optimizing Data Placement of Loops for Energy Minimization with Multiple Types of Memories
Journal of Signal Processing Systems
Minimizing accumulative memory load cost on multi-core DSPs with multi-level memory
Journal of Systems Architecture: the EUROMICRO Journal
Scheduling of synchronous data flow models onto scratchpad memory-based embedded processors
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on ESTIMedia'10
SPM-Sieve: a framework for assisting data partitioning in scratch pad memory based systems
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
This article presents a technique for the efficient compiler management of software-exposed heterogeneous memory. In many lower-end embedded chips, often used in microcontrollers and DSP processors, heterogeneous memory units such as scratch-pad SRAM, internal DRAM, external DRAM, and ROM are visible directly to the software, without automatic management by a hardware caching mechanism. Instead, the memory units are mapped to different portions of the address space. Caches are avoided due to their cost and power consumption, and because they make it difficult to guarantee real-time performance. For this important class of embedded chips, the allocation of data to different memory units to maximize performance is the responsibility of the software.Current practice typically leaves it to the programmer to partition the data among different memory units. We present a compiler strategy that automatically partitions the data among the memory units. We show that this strategy is optimal, relative to the profile run, among all static partitions for global and stack data. For the first time, our allocation scheme for stacks distributes the stack among multiple memory units. For global and stack data, the scheme is provably equal to or better than any other compiler scheme or set of programmer annotations. Results from our benchmarks show a 44.2% reduction in runtime from using our distributed stack strategy vs. using a unified stack, and a further 11.8% reduction in runtime from using a linear optimization strategy for allocation vs. a simpler greedy strategy; both in the case of the SRAM size being 20% of the total data size. For some programs, less than 5% of data in SRAM achieves a similar speedup.