Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
Theory of linear and integer programming
Theory of linear and integer programming
A Scheme to Enforce Data Dependence on Large Multiprocessor Systems
IEEE Transactions on Software Engineering
Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
Compiler algorithms for synchronization
IEEE Transactions on Computers
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Loop quantization: a generalized loop unwinding technique
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors
IEEE Transactions on Computers
Dynamic Processor Self-Scheduling for General Parallel Nested Loops
IEEE Transactions on Computers
Program Improvement by Source-to-Source Transformation
Journal of the ACM (JACM)
Algorithm and bound for the greatest common divisor of n integers
Communications of the ACM
Dependence graphs and compiler optimizations
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Structure of Computers and Computations
Structure of Computers and Computations
An Empirical Study of Fortran Programs for Parallelizing Compilers
IEEE Transactions on Parallel and Distributed Systems
Multiprocessors: discussion of some theoretical and practical problems
Multiprocessors: discussion of some theoretical and practical problems
Optimizing supercompilers for supercomputers
Optimizing supercompilers for supercomputers
Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)
Reducing data communication overhead for DOACROSS loop nests
ICS '94 Proceedings of the 8th international conference on Supercomputing
ICS '94 Proceedings of the 8th international conference on Supercomputing
Mappings for communication minimization using distribution and alignment
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
A loop parallelization technique for linear dependence vector
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
An Efficient Solution to the Cache Thrashing Problem Caused by True Data Sharing
IEEE Transactions on Computers
Chain Grouping: A Method for Partitioning Loops onto Mesh-Connected Processor Arrays
IEEE Transactions on Parallel and Distributed Systems
Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Interprocedural Transformations for Extracting Maximum Parallelism
ADVIS '02 Proceedings of the Second International Conference on Advances in Information Systems
Optimal task scheduling at run time to exploit intra-tile parallelism
Parallel Computing
Partitioning Loops with Variable Dependence Distances
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Extracting Parallelism in Nested Loops
COMPSAC '96 Proceedings of the 20th Conference on Computer Software and Applications
Removing false code dependencies to speedup software build processes
CASCON '03 Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative research
Journal of Parallel and Distributed Computing
Automatic parallel code generation for tiled nested loops
Proceedings of the 2004 ACM symposium on Applied computing
A novel approach for partitioning iteration spaces with variable densities
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatic array partitioning based on the Smith normal form
International Journal of Parallel Programming
A general approach for partitioning N-dimensional parallel nested loops with conditionals
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Message-passing code generation for non-rectangular tiling transformations
Parallel Computing
Cache-aware iteration space partitioning
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Transformations techniques for extracting parallelism in non-uniform nested loops
WSEAS Transactions on Computers
Cache-aware partitioning of multi-dimensional iteration spaces
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Optimizing shared cache behavior of chip multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
The Fortran parallel transformer and its programming environment
Information Sciences: an International Journal
Selecting the tile shape to reduce the total communication volume
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A geometric approach for partitioning n-dimensional non-rectangular iteration spaces
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
A study of performance scalability by parallelizing loop iterations on multi-core SMPs
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Extracting coarse-grained parallelism for affine perfectly nested quasi-uniform loops
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Hi-index | 0.00 |
A general method for the identification of the independent subsets in loops with constant dependence vectors is presented. It is shown that the dependence relation remains invariant under a unimodular transformation. Then a unimodular transformation is used to bring the dependence matrix into a form where the independent subsets are obtained bya direct and inexpensive partitioning algorithm. This leads to a procedure for the automatic conversion of a serial loop into a nest of parallel DO-ALL loops. Another unimodular transformation results in an algorithm to label the dependent iterations of an n-fold nested loop in O(n/sup 2/) time. This provides a multithreaded dynamic scheduling scheme requiring only one fork and one join primitive.