Loop Distribution and Fusion with Timing and Code Size Optimization

Authors:
Meilin Liu;Edwin H. Sha;Qingfeng Zhuge;Yi He;Meikang Qiu
Affiliations:
Department of Computer Science, Wright State University, Dayton, USA 45435;Department of Computer Science, University of Texas at Dallas, Richardson, USA 75083;Department of Computer Science, University of Texas at Dallas, Richardson, USA 75083;Department of Computer Science, University of Texas at Dallas, Richardson, USA 75083;Department of Electrical and Computer Engineering, University of Kentucky, Lexington, USA 40506
Venue:
Journal of Signal Processing Systems
Year:
2011

Citing 21
Cited 0

Loop distribution with arbitrary control flow

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Optimizing for parallelism and data locality

ICS '92 Proceedings of the 6th international conference on Supercomputing
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Fusion of Loops for Parallelism and Locality

IEEE Transactions on Parallel and Distributed Systems
Optimizing Overall Loop Schedules Using Prefetching and Partitioning

IEEE Transactions on Parallel and Distributed Systems
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
On the Complexity of Loop Fusion

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Transforming Complex Loop Nests for Locality

The Journal of Supercomputing
Memory Requirement Optimization with Loop Fusion and Loop Shifting

DSD '04 Proceedings of the Digital System Design, EUROMICRO Systems
General loop fusion technique for nested loops considering timing and code size

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion

International Journal of High Performance Computing Applications
Lattice-Based Memory Allocation

IEEE Transactions on Computers
Register aware scheduling for distributed cache clustered architecture

ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Engineering A Compiler

Engineering A Compiler

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a technique that combines loop distribution with maximum direct loop fusion (LD_MDF) is proposed. The technique performs maximum loop distribution, followed by maximum direct loop fusion to optimize timing and code size simultaneously. The loop distribution theorems that state the conditions distributing any multi-level nested loop in the maximum way are proved. It is proved that the statements involved in the dependence cycle can be fully distributed if the summation of the edge weight of the dependence cycle satisfies a certain condition; otherwise, the statements should be put in the same loop after loop distribution. Based on the loop distribution theorems, algorithms are designed to conduct maximum loop distribution. The maximum direct loop fusion problem is mapped to the graph partitioning problem. A polynomial graph partitioning algorithm is developed to compute the fusion partitions. It is proved that the proposed maximum direct loop fusion algorithm produces the fewest number of resultant loop nests without violating dependence constraints. It is also shown that the resultant code size of the fused loops by the technique of loop distribution with maximum direct loop fusion is smaller than the code size of the original loops when the number of fused loops is less than the number of the original loops. The simulation results are presented to validate the proposed technique.