Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Exploiting off-chip memory access modes in high-level synthesis
ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
Global multimedia system design exploration using accurate memory organization feedback
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Memory aware compilation through accurate timing extraction
Proceedings of the 37th Annual Design Automation Conference
Minimizing the required memory bandwidth in VLSI system realizations
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The parallel execution of DO loops
Communications of the ACM
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Heterogeneous memory management for embedded systems
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Loop fusion for clustered VLIW architectures
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
A two-stage solution approach to multidimensional periodic scheduling
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hierarchical task scheduler for interleaving subtasks on heterogeneous multiprocessor platforms
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
MPSoC memory optimization using program transformation
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Buffer and Register Allocation for Memory Space Optimization
Journal of VLSI Signal Processing Systems
Reducing memory requirements of resource-constrained applications
ACM Transactions on Embedded Computing Systems (TECS)
Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia Applications
Journal of Signal Processing Systems
Data locality and parallelism optimization using a constraint-based approach
Journal of Parallel and Distributed Computing
Loop fusion and reordering for register file optimization on stream processors
Journal of Systems and Software
Hi-index | 0.00 |
The memory bandwidth largely determines the performance and energy cost of embedded systems. At the compiler level, several techniques improve the memory bandwidth at the scope of a basic block, but often fail to exploit all. We propose a technique to optimize the memory bandwidth across the boundaries of a basic block. Our technique incrementally fuses loops to better use the available bandwidth. The resulting performance depends on how the data is assigned to the memories of the memory layer. At the same time, the assignment also strongly influences the energy cost. Therefore, we combine in our approach the fusion and assignment decisions. Designers can use our output to trade-off the energy cost with the system's performance.