Optimal loop parallelization for maximizing iteration-level parallelism

Authors:
Duo Liu;Zili Shao;Meng Wang;Minyi Guo;Jingling Xue
Affiliations:
The Hong Kong Polytechnic University, Hung Hom, Hong Kong;The Hong Kong Polytechnic University, Hung Hom, Hong Kong;The Hong Kong Polytechnic University, Hung Hom, Hong Kong;Shanghai Jiao Tong University, Shanghai, China;The University of New South Wales, Sydney, Australia
Venue:
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Year:
2009

Citing 25
Cited 4

Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Optimal loop parallelization

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Advanced loop optimizations for parallel computers

Proceedings of the 1st International Conference on Supercomputing
Scheduling and behavioral transformation for parallel systems

Scheduling and behavioral transformation for parallel systems
Achieving Full Parallelism Using Multidimensional Retiming

IEEE Transactions on Parallel and Distributed Systems
Fusion of Loops for Parallelism and Locality

IEEE Transactions on Parallel and Distributed Systems
Automatically partitioning threads for multithreaded architectures

Journal of Parallel and Distributed Computing - Special issue on compilation and architectural support for parallel applications
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
On Loop Transformations for Generalized Cycle Shrinking

IEEE Transactions on Parallel and Distributed Systems
Some Architectural and Compilation Issues in the Design of Hierarchical Shared-Memory Multiprocessors

IPPS '92 Proceedings of the 6th International Parallel Processing Symposium
Complexity of Multi-dimensional Loop Alignment

STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Cycle Shrinking by Dependence Reduction

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Reordering the Statements with Dependence Cycles to Improve the Performance of Parallel Loops

ICPADS '97 Proceedings of the 1997 International Conference on Parallel and Distributed Systems
(R) Polynomial - Time Nested Loop Fusion with Full Parallelism

ICPP '96 Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3
Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)

Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)
General loop fusion technique for nested loops considering timing and code size

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Exploitation of parallelism to nested loops with dependence cycles

Journal of Systems Architecture: the EUROMICRO Journal
Enabling Loop Fusion and Tiling for Cache Performance by Fixing Fusion-Preventing Data Dependences

ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Iterational retiming: maximize iteration-level parallelism for nested loops

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Statement Re-ordering for DOACROSS Loops

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 02
Full Parallelism in Uniform Nested Loops Using Multi-Dimensional Retiming

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 02
High-Speed Multiprocessors and Compilation Techniques

IEEE Transactions on Computers
Improving the parallelism of iterative methods by aggressive loop fusion

The Journal of Supercomputing
Rotation scheduling: a loop pipelining algorithm

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Energy-Aware Loop Parallelism Maximization for Multi-core DSP Architectures

GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
HELIX: automatic parallelization of irregular programs for chip multiprocessing

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Automatic extraction of pipeline parallelism for embedded heterogeneous multi-core platforms

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper solves the open problem of extracting the maximal number of iterations from a loop that can be executed in parallel on chip multiprocessors. Our algorithm solves it optimally by migrating the weights of parallelism-inhibiting dependences on dependence cycles in two phases. First, we model dependence migration with retiming and formulate this classic loop parallelization into a graph optimization problem, i.e., one of finding retiming values for its nodes so that the minimum non-zero edge weight in the graph is maximized. We present our algorithm in three stages with each being built incrementally on the preceding one. Second, the optimal code for a loop is generated from the retimed graph of the loop found in the first phase. We demonstrate the effectiveness of our optimal algorithm by comparing with a number of representative non-optimal algorithms using a set of benchmarks frequently used in prior work.