Optimizing the memory bandwidth with loop fusion

Authors:
Paul Marchal;José Ignacio Gómez;Francky Catthoor
Affiliations:
IMEC/KULEUVEN, Heverlee, Belgium;DACYA U.C.M., Madrid, Spain;IMEC/KULEUVEN, Heverlee, Belgium
Venue:
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Year:
2004

Citing 11
Cited 7

Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Exploiting off-chip memory access modes in high-level synthesis

ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
Global multimedia system design exploration using accurate memory organization feedback

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Memory aware compilation through accurate timing extraction

Proceedings of the 37th Annual Design Automation Conference
Minimizing the required memory bandwidth in VLSI system realizations

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The parallel execution of DO loops

Communications of the ACM
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Heterogeneous memory management for embedded systems

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Loop fusion for clustered VLIW architectures

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
A two-stage solution approach to multidimensional periodic scheduling

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Hierarchical task scheduler for interleaving subtasks on heterogeneous multiprocessor platforms

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
MPSoC memory optimization using program transformation

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Buffer and Register Allocation for Memory Space Optimization

Journal of VLSI Signal Processing Systems
Reducing memory requirements of resource-constrained applications

ACM Transactions on Embedded Computing Systems (TECS)
Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia Applications

Journal of Signal Processing Systems
Data locality and parallelism optimization using a constraint-based approach

Journal of Parallel and Distributed Computing
Loop fusion and reordering for register file optimization on stream processors

Journal of Systems and Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

The memory bandwidth largely determines the performance and energy cost of embedded systems. At the compiler level, several techniques improve the memory bandwidth at the scope of a basic block, but often fail to exploit all. We propose a technique to optimize the memory bandwidth across the boundaries of a basic block. Our technique incrementally fuses loops to better use the available bandwidth. The resulting performance depends on how the data is assigned to the memories of the memory layer. At the same time, the assignment also strongly influences the energy cost. Therefore, we combine in our approach the fusion and assignment decisions. Designers can use our output to trade-off the energy cost with the system's performance.