Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Advanced loop optimizations for parallel computers
Proceedings of the 1st International Conference on Supercomputing
Scheduling and behavioral transformation for parallel systems
Scheduling and behavioral transformation for parallel systems
Achieving Full Parallelism Using Multidimensional Retiming
IEEE Transactions on Parallel and Distributed Systems
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
Automatically partitioning threads for multithreaded architectures
Journal of Parallel and Distributed Computing - Special issue on compilation and architectural support for parallel applications
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
On Loop Transformations for Generalized Cycle Shrinking
IEEE Transactions on Parallel and Distributed Systems
IPPS '92 Proceedings of the 6th International Parallel Processing Symposium
Complexity of Multi-dimensional Loop Alignment
STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Cycle Shrinking by Dependence Reduction
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Reordering the Statements with Dependence Cycles to Improve the Performance of Parallel Loops
ICPADS '97 Proceedings of the 1997 International Conference on Parallel and Distributed Systems
(R) Polynomial - Time Nested Loop Fusion with Full Parallelism
ICPP '96 Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3
Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)
General loop fusion technique for nested loops considering timing and code size
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Exploitation of parallelism to nested loops with dependence cycles
Journal of Systems Architecture: the EUROMICRO Journal
Enabling Loop Fusion and Tiling for Cache Performance by Fixing Fusion-Preventing Data Dependences
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Iterational retiming: maximize iteration-level parallelism for nested loops
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Statement Re-ordering for DOACROSS Loops
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 02
Full Parallelism in Uniform Nested Loops Using Multi-Dimensional Retiming
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 02
High-Speed Multiprocessors and Compilation Techniques
IEEE Transactions on Computers
Improving the parallelism of iterative methods by aggressive loop fusion
The Journal of Supercomputing
Rotation scheduling: a loop pipelining algorithm
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Energy-Aware Loop Parallelism Maximization for Multi-core DSP Architectures
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
HELIX: automatic parallelization of irregular programs for chip multiprocessing
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Automatic extraction of pipeline parallelism for embedded heterogeneous multi-core platforms
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Hi-index | 0.00 |
This paper solves the open problem of extracting the maximal number of iterations from a loop that can be executed in parallel on chip multiprocessors. Our algorithm solves it optimally by migrating the weights of parallelism-inhibiting dependences on dependence cycles in two phases. First, we model dependence migration with retiming and formulate this classic loop parallelization into a graph optimization problem, i.e., one of finding retiming values for its nodes so that the minimum non-zero edge weight in the graph is maximized. We present our algorithm in three stages with each being built incrementally on the preceding one. Second, the optimal code for a loop is generated from the retimed graph of the loop found in the first phase. We demonstrate the effectiveness of our optimal algorithm by comparing with a number of representative non-optimal algorithms using a set of benchmarks frequently used in prior work.