Partitioning and Labeling of Loops by Unimodular Transformations

Authors:
E. H. D'Hollander
Affiliations:
-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1992

Citing 18
Cited 28

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Theory of linear and integer programming

Theory of linear and integer programming
A Scheme to Enforce Data Dependence on Large Multiprocessor Systems

IEEE Transactions on Software Engineering
Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
Compiler algorithms for synchronization

IEEE Transactions on Computers
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Loop quantization: a generalized loop unwinding technique

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors

IEEE Transactions on Computers
Dynamic Processor Self-Scheduling for General Parallel Nested Loops

IEEE Transactions on Computers
Program Improvement by Source-to-Source Transformation

Journal of the ACM (JACM)
Algorithm and bound for the greatest common divisor of n integers

Communications of the ACM
Dependence graphs and compiler optimizations

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Structure of Computers and Computations

Structure of Computers and Computations
An Empirical Study of Fortran Programs for Parallelizing Compilers

IEEE Transactions on Parallel and Distributed Systems
Multiprocessors: discussion of some theoretical and practical problems

Multiprocessors: discussion of some theoretical and practical problems
Optimizing supercompilers for supercomputers

Optimizing supercompilers for supercomputers
Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)

Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)

Reducing data communication overhead for DOACROSS loop nests

ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler techniques for maximizing fine-grain and coarse-grain parallelism in loops with uniform dependences

ICS '94 Proceedings of the 8th international conference on Supercomputing
Mappings for communication minimization using distribution and alignment

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
A loop parallelization technique for linear dependence vector

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
An Efficient Solution to the Cache Thrashing Problem Caused by True Data Sharing

IEEE Transactions on Computers
Chain Grouping: A Method for Partitioning Loops onto Mesh-Connected Processor Arrays

IEEE Transactions on Parallel and Distributed Systems
Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Interprocedural Transformations for Extracting Maximum Parallelism

ADVIS '02 Proceedings of the Second International Conference on Advances in Information Systems
Optimal task scheduling at run time to exploit intra-tile parallelism

Parallel Computing
Partitioning Loops with Variable Dependence Distances

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Extracting Parallelism in Nested Loops

COMPSAC '96 Proceedings of the 20th Conference on Computer Software and Applications
Removing false code dependencies to speedup software build processes

CASCON '03 Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative research
A pipelined schedule to minimize completion time for loop tiling with computation and communication overlapping

Journal of Parallel and Distributed Computing
Automatic parallel code generation for tiled nested loops

Proceedings of the 2004 ACM symposium on Applied computing
A novel approach for partitioning iteration spaces with variable densities

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatic array partitioning based on the Smith normal form

International Journal of Parallel Programming
A general approach for partitioning N-dimensional parallel nested loops with conditionals

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Message-passing code generation for non-rectangular tiling transformations

Parallel Computing
Cache-aware iteration space partitioning

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Transformations techniques for extracting parallelism in non-uniform nested loops

WSEAS Transactions on Computers
Cache-aware partitioning of multi-dimensional iteration spaces

SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Optimizing shared cache behavior of chip multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
The Fortran parallel transformer and its programming environment

Information Sciences: an International Journal
Selecting the tile shape to reduce the total communication volume

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Coarse-grained loop parallelization: Iteration Space Slicing vs affine transformations

Parallel Computing
A geometric approach for partitioning n-dimensional non-rectangular iteration spaces

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
A study of performance scalability by parallelizing loop iterations on multi-core SMPs

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Extracting coarse-grained parallelism for affine perfectly nested quasi-uniform loops

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

A general method for the identification of the independent subsets in loops with constant dependence vectors is presented. It is shown that the dependence relation remains invariant under a unimodular transformation. Then a unimodular transformation is used to bring the dependence matrix into a form where the independent subsets are obtained bya direct and inexpensive partitioning algorithm. This leads to a procedure for the automatic conversion of a serial loop into a nest of parallel DO-ALL loops. Another unimodular transformation results in an algorithm to label the dependent iterations of an n-fold nested loop in O(n/sup 2/) time. This provides a multithreaded dynamic scheduling scheme requiring only one fork and one join primitive.