Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Achieving Full Parallelism Using Multidimensional Retiming
IEEE Transactions on Parallel and Distributed Systems
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
Optimal weighted loop fusion for parallel programs
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Loop fusion for clustered VLIW architectures
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
On the Complexity of Loop Fusion
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Loop Alignment for Memory Accesses Optimization
Proceedings of the 12th international symposium on System synthesis
Code size reduction technique and implementation for software-pipelined DSP applications
ACM Transactions on Embedded Computing Systems (TECS)
Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP
Journal of Parallel and Distributed Computing
Optimal loop parallelization for maximizing iteration-level parallelism
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Energy-Aware Loop Parallelism Maximization for Multi-core DSP Architectures
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Loop Distribution and Fusion with Timing and Code Size Optimization
Journal of Signal Processing Systems
Loop distribution and fusion with timing and code size optimization for embedded DSPs
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Hi-index | 0.00 |
Loop fusion is commonly used to improve the instruction-level parallelism of loops for high-performance embedded computing systems. Loop fusion, however, is not always directly applicable because the fusion prevention dependencies may exist among loops. Most of the existing techniques still have limitations in fully exploiting the advantages of loop fusion. In this paper, we present a general loop fusion technique for loops or nested loops based on the loop dependency graph model, retiming, and multi-dimensional retiming concepts. We show that any "J+K" model loop can be legally fused using our legalizing fusion technique. Polynomial-time algorithms are developed to solve the loop fusion problem for "J+K" model loops considering both timing and code size of the final code. Our technique produces the final code and calculates the resultant code size directly from the retiming values. The experimental results show that our loop fusion technique always significantly reduces the schedule length.