Loop distribution with arbitrary control flow
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
Optimizing Overall Loop Schedules Using Prefetching and Partitioning
IEEE Transactions on Parallel and Distributed Systems
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
On the Complexity of Loop Fusion
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Transforming Complex Loop Nests for Locality
The Journal of Supercomputing
Memory Requirement Optimization with Loop Fusion and Loop Shifting
DSD '04 Proceedings of the Digital System Design, EUROMICRO Systems
General loop fusion technique for nested loops considering timing and code size
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion
International Journal of High Performance Computing Applications
Lattice-Based Memory Allocation
IEEE Transactions on Computers
Register aware scheduling for distributed cache clustered architecture
ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Engineering A Compiler
Hi-index | 0.00 |
In this paper, a technique that combines loop distribution with maximum direct loop fusion (LD_MDF) is proposed. The technique performs maximum loop distribution, followed by maximum direct loop fusion to optimize timing and code size simultaneously. The loop distribution theorems that state the conditions distributing any multi-level nested loop in the maximum way are proved. It is proved that the statements involved in the dependence cycle can be fully distributed if the summation of the edge weight of the dependence cycle satisfies a certain condition; otherwise, the statements should be put in the same loop after loop distribution. Based on the loop distribution theorems, algorithms are designed to conduct maximum loop distribution. The maximum direct loop fusion problem is mapped to the graph partitioning problem. A polynomial graph partitioning algorithm is developed to compute the fusion partitions. It is proved that the proposed maximum direct loop fusion algorithm produces the fewest number of resultant loop nests without violating dependence constraints. It is also shown that the resultant code size of the fused loops by the technique of loop distribution with maximum direct loop fusion is smaller than the code size of the original loops when the number of fused loops is less than the number of the original loops. The simulation results are presented to validate the proposed technique.