Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Selected papers of the second workshop on Languages and compilers for parallel computing
Efficient and exact data dependence analysis
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
Dependence Analysis for Supercomputing
Dependence Analysis for Supercomputing
Automatic synthesis of systolic arrays from uniform recurrent equations
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
Automatic generation of systolic programs from nested loops
Automatic generation of systolic programs from nested loops
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A general framework for iteration-reordering loop transformations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A dynamic scheduling method for irregular parallel programs
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
ICS '92 Proceedings of the 6th international conference on Supercomputing
Access normalization: loop restructuring for NUMA compilers
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Non-unimodular transformations of nested loops
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Loop transformations for NUMA machines
ACM SIGPLAN Notices - Workshop on languages, compilers and run-time environments for distributed memory multiprocessors
Access normalization: loop restructuring for NUMA computers
ACM Transactions on Computer Systems (TOCS)
Exact side effects for interprocedural dependence analysis
ICS '93 Proceedings of the 7th international conference on Supercomputing
Partitioning the statement per iteration space using non-singular matrices
ICS '93 Proceedings of the 7th international conference on Supercomputing
Reducing data communication overhead for DOACROSS loop nests
ICS '94 Proceedings of the 8th international conference on Supercomputing
Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
ICS '94 Proceedings of the 8th international conference on Supercomputing
Defining, Analyzing, and Transforming Program Constructs
IEEE Parallel & Distributed Technology: Systems & Technology
Fusing loops with backward inter loop data dependence
ACM SIGPLAN Notices
SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
Memory estimation for high level synthesis
DAC '94 Proceedings of the 31st annual Design Automation Conference
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
XIL and YIL: the intermediate languages of TOBEY
IR '95 Papers from the 1995 ACM SIGPLAN workshop on Intermediate representations
Generating parallel code from object oriented mathematical models
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler optimizations for eliminating barrier synchronization
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
An HPF compiler for the IBM SP2
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Controlling application grain size on a network of workstations
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
System level verification of video and image processing specifications
ISSS '95 Proceedings of the 8th international symposium on System synthesis
Compiler cache optimizations for banded matrix problems
ICS '95 Proceedings of the 9th international conference on Supercomputing
Unified compilation techniques for shared and distributed address space machines
ICS '95 Proceedings of the 9th international conference on Supercomputing
Optimal tile size adjustment in compiling general DOACROSS loop nests
ICS '95 Proceedings of the 9th international conference on Supercomputing
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Mappings for communication minimization using distribution and alignment
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
IEEE Transactions on Parallel and Distributed Systems
Automatic Data Structure Selection and Transformation for Sparse Matrix Computations
IEEE Transactions on Parallel and Distributed Systems
Achieving Full Parallelism Using Multidimensional Retiming
IEEE Transactions on Parallel and Distributed Systems
Optimal weighted loop fusion for parallel programs
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Efficient Algorithms for Data Distribution on Distributed Memory Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Determining the idle time of a tiling
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Maximizing parallelism and minimizing synchronization with affine transforms
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Potential-driven statistical ordering of transformations
DAC '97 Proceedings of the 34th annual Design Automation Conference
Proceedings of the fifth workshop on I/O in parallel and distributed systems
Tuning compiler optimizations for simultaneous multithreading
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Efficient householder QR factorization for superscalar processors
ACM Transactions on Mathematical Software (TOMS)
An Approach to Designing Modular Extensible Linear Arrays for Regular Algorithms
IEEE Transactions on Computers
Cost Effective VLSI Architectures for Full-SearchBlock-Matching Motion Estimation Algorithm
Journal of VLSI Signal Processing Systems - Special issue on recent development in video: algorithms, implementation and applications
A methodology for guided behavioral-level optimization
DAC '98 Proceedings of the 35th annual Design Automation Conference
A general algorithm for tiling the register level
ICS '98 Proceedings of the 12th international conference on Supercomputing
A Compiler Optimization Algorithm for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
An affine partitioning algorithm to maximize parallelism and minimize communication
ICS '99 Proceedings of the 13th international conference on Supercomputing
Selecting tile shape for minimal execution time
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
An Object-Oriented Framework for Loop Parallelization
The Journal of Supercomputing
Power optimization using divide-and-conquer techniques for minimization of the number of operations
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Probabilistic Loop Scheduling for Applications with Uncertain Execution Time
IEEE Transactions on Computers
Locality optimizations for multi-level caches
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A Space-Time Representation Method of Iterative Algorithms for the Design of Processor Arrays
Journal of VLSI Signal Processing Systems
Automatic loop transformations and parallelization for Java
Proceedings of the 14th international conference on Supercomputing
Tuning Compiler Optimizations for Simultaneous Multithreading
International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Statement-Level Communication-Free Partitioning Techniques for Parallelizing Compilers
The Journal of Supercomputing
A Loop Transformation Algorithm for Communication Overlapping
International Journal of Parallel Programming - Special issue on international symposium on high performance computing 1997, part I
From flop to megaflops: Java for technical computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Properties and Algorithms for Unfolding of Probabilistic Data-Flow Graphs
Journal of VLSI Signal Processing Systems
IEEE Transactions on Parallel and Distributed Systems
Generation of Efficient Nested Loops from Polyhedra
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Chain Grouping: A Method for Partitioning Loops onto Mesh-Connected Processor Arrays
IEEE Transactions on Parallel and Distributed Systems
Matching and searching analysis for parallel hardware implementation on FPGAs
FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
Minimizing Average Schedule Length under Memory Constraints by Optimal Partitioning and Prefetching
Journal of VLSI Signal Processing Systems
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Reducing memory requirements of nested loops for embedded systems
Proceedings of the 38th annual Design Automation Conference
Loop parallelization algorithms
Compiler optimizations for scalable parallel systems
Communication-free partitioning of nested loops
Compiler optimizations for scalable parallel systems
Performance-constrained pipelining of software loops onto reconfigurable hardware
FPGA '02 Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays
Data reorganization engines for the next generation of system-on-a-chip FPGAs
FPGA '02 Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays
IEEE Transactions on Parallel and Distributed Systems
Automatic data and computation decomposition on distributed memory parallel computers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Register tiling in nonrectangular iteration spaces
ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic Partitioning of Parallel Loops with Parallelepiped-Shaped Tiles
IEEE Transactions on Parallel and Distributed Systems
Memory Design and Exploration for Low Power, Embedded Systems
Journal of VLSI Signal Processing Systems - Special issue on signal processing systems design and implementation
Quantifying the Multi-Level Nature of Tiling Interactions
International Journal of Parallel Programming
Reuse-Driven Tiling for Improving Data Locality
International Journal of Parallel Programming
Time-minimal tiling when rise is larger than zero
Parallel Computing
A Layout-Conscious Iteration Space Transformation Technique
IEEE Transactions on Computers
Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory
IEEE Transactions on Parallel and Distributed Systems
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Loop Transformation Using Nonunimodular Matrices
IEEE Transactions on Parallel and Distributed Systems
A General Methodology of Partitioning and Mapping for Given Regular Arrays
IEEE Transactions on Parallel and Distributed Systems
Affine-by-Statement Transformations of Imperfectly Nested Loops
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
The Combined Effectiveness of Unimodular Transformations, Tiling, and Software Prefetching
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A BSP Approach to the Scheduling of Tightly-Nested Loops
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Next Generation System Software for Future High-End Computing Systems
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
From Flop to MegaFlops: Java for Technical Computing
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Optimized Execution of Fortran 90 Array Language on Symmetric Shared-Memory Multiprocessors
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Optimizing Java Programs in the Presence of Exceptions
ECOOP '00 Proceedings of the 14th European Conference on Object-Oriented Programming
Structured Scheduling of Recurrence Equations: Theory and Practice
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Exact Partitioning of Affine Dependence Algorithms
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Complexity of Multi-dimensional Loop Alignment
STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Scheduling the Computations of a Loop Nest with Respect to a Given Mapping
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A Technique for FPGA Synthesis Driven by Automatic Source Code Analysis and Transformations
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Loop Transformations for Hierarchical Parallelism and Locality
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Optimizing Computational and Spatial Overheads in Complex Transformed Loops
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Structured scheduling of recurrence equations: theory and practice
Embedded processor design challenges
Exact partitioning of affine dependence algorithms
Embedded processor design challenges
On the Parallel Execution Time of Tiled Loops
IEEE Transactions on Parallel and Distributed Systems
Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework
IEEE Transactions on Parallel and Distributed Systems
QR factorization for shared memory and message passing
Parallel Computing
Three-dimensional orthogonal tile sizing problem: mathematical programming approach
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Fully Parallel Hardware/Software Codesign for Multi-Dimensional DSP Applications
CODES '96 Proceedings of the 4th International Workshop on Hardware/Software Co-Design
Pipeline Vectorization for Reconfigurable Systems
FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Automatic Synthesis of Data Storage and Control Structures for FPGA-Based Computing Engines
FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Reference Distance as a Metric for Data Locality
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
A Loop Transformation for Maximizing Parallelism from Single Loops with Nonuniform Dependencies
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Partitioning Loops with Variable Dependence Distances
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Extracting Parallelism in Nested Loops
COMPSAC '96 Proceedings of the 20th Conference on Computer Software and Applications
Code Transformations for Low Power Caching in Embedded Multimedia Processors
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Sourcebook of parallel computing
Single Assignment C: efficient support for high-level array operations in a functional setting
Journal of Functional Programming
Automatic parallel code generation for tiled nested loops
Proceedings of the 2004 ACM symposium on Applied computing
A data locality optimizing algorithm
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Input data reuse in compiling window operations onto reconfigurable hardware
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
An extended ANSI C for processors with a multimedia extension
International Journal of Parallel Programming
Performance and Area Modeling of Complete FPGA Designs in the Presence of Loop Transformations
IEEE Transactions on Computers
Journal of Systems and Software - Special issue: Software engineering education and training
Optimizing inter-processor data locality on embedded chip multiprocessors
Proceedings of the 5th ACM international conference on Embedded software
Behavior and communication co-optimization for systems with sequential communication media
Proceedings of the 43rd annual Design Automation Conference
Source level transformations to improve I/O data partitioning
SNAPI '03 Proceedings of the international workshop on Storage network architecture and parallel I/Os
Message-passing code generation for non-rectangular tiling transformations
Parallel Computing
Function level parallelism driven by data dependencies
ACM SIGARCH Computer Architecture News
A scalable embedded JPEG 2000 architecture
Journal of Systems Architecture: the EUROMICRO Journal
Maximize Parallelism Minimize Overhead for Nested Loops via Loop Striping
Journal of VLSI Signal Processing Systems
MPSoC memory optimization using program transformation
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Incremental hierarchical memory size estimation for steering of loop transformations
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A memory-conscious code parallelization scheme
Proceedings of the 44th annual Design Automation Conference
Designer-controlled generation of parallel and flexible heterogeneous MPSoC specification
Proceedings of the 44th annual Design Automation Conference
SPRINT: a tool to generate concurrent transaction-level models from sequential code
EURASIP Journal on Applied Signal Processing
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Using FORAY models to enable MPSoC memory optimizations
International Journal of Parallel Programming - Special Issue on Multiprocessor-based embedded systems
Timing optimization via nest-loop pipelining considering code size
Microprocessors & Microsystems
Guidance of Loop Ordering for Reduced Memory Usage in Signal Processing Applications
Journal of Signal Processing Systems
Transformations techniques for extracting parallelism in non-uniform nested loops
WSEAS Transactions on Computers
Matrix-based streamization approach for improving locality and parallelism on FT64 stream processor
The Journal of Supercomputing
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers I
Affine and unimodular transformations for non-uniform nested loops
ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
Cache-aware partitioning of multi-dimensional iteration spaces
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Efficient hybrid parallelisation of tiled algorithms on SMP clusters
International Journal of Computational Science and Engineering
Optimal loop parallelization for maximizing iteration-level parallelism
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Slicing based code parallelization for minimizing inter-processor communication
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
ODISET: On-line distributed session tracing using agents
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Parallel image processing with the block data parallel architecture
IBM Journal of Research and Development
Parallel loop generation and scheduling
The Journal of Supercomputing
On minimizing register usage of linearly scheduled algorithms with uniform dependencies
Computer Languages, Systems and Structures
Bridging the gap between compilation and synthesis in the DEFACTO system
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
A profile-based tool for finding pipeline parallelism in sequential programs
Parallel Computing
DMATiler: revisiting loop tiling for direct memory access
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Selecting the tile shape to reduce the total communication volume
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Data locality and parallelism optimization using a constraint-based approach
Journal of Parallel and Distributed Computing
McFLAT: a profile-based framework for MATLAB loop analysis and transformations
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
A programming language interface to describe transformations and code generation
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Loop Distribution and Fusion with Timing and Code Size Optimization
Journal of Signal Processing Systems
Loop striping: maximize parallelism for nested loops
EUC'06 Proceedings of the 2006 international conference on Embedded and Ubiquitous Computing
A framework for compiler driven design space exploration for embedded system customization
ASIAN'04 Proceedings of the 9th Asian Computing Science conference on Advances in Computer Science: dedicated to Jean-Louis Lassez on the Occasion of His 5th Cycle Birthday
Optimizing data locality using array tiling
Proceedings of the International Conference on Computer-Aided Design
Combined loop transformation and hierarchy allocation for data reuse optimization
Proceedings of the International Conference on Computer-Aided Design
Experiments with auto-parallelizing SPEC2000FP benchmarks
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Forward communication only placements and their use for parallel program construction
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Loop transformation recipes for code generation and auto-tuning
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
TL-DAE: thread-level decoupled access/execution for OpenMP on the cyclops-64 many-core processor
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Impact of array data flow analysis on the design of energy-efficient circuits
PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Matrix-Based programming optimization for improving memory hierarchy performance on imagine
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Optimizing memory hierarchy allocation with loop transformations for high-level synthesis
Proceedings of the 49th Annual Design Automation Conference
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Hierarchical overlapped tiling
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Partitioning and scheduling loops on NOWs
Computer Communications
Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Architecture-based optimization for mapping scientific applications to imagine
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Improved loop tiling based on the removal of spurious false dependences
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
High performance FFT on SGI Altix 3700
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Sub-polyhedral scheduling using (unit-)two-variable-per-inequality polyhedra
POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Improving last level cache locality by integrating loop and data transformations
Proceedings of the International Conference on Computer-Aided Design
Optimizing 3d convolutions for wavelet transforms on CPUs with SSE units and GPUs
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Automatic extraction of pipeline parallelism for embedded heterogeneous multi-core platforms
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Loop Transforming for Reducing Data Alignment on Multi-Core SIMD Processors
Journal of Signal Processing Systems
Hi-index | 0.01 |
An approach to transformations for general loops in which dependence vectors represent precedence constraints on the iterations of a loop is presented. Therefore, dependences extracted from a loop nest must be lexicographically positive. This leads to a simple test for legality of compound transformations: any code transformation that leaves the dependences lexicographically positive is legal. The loop transformation theory is applied to the problem of maximizing the degree of coarse- or fine-grain parallelism in a loop nest. It is shown that the maximum degree of parallelism can be achieved by transforming the loops into a nest of coarsest fully permutable loop nests and wavefronting the fully permutable nests. The canonical form of coarsest fully permutable nests can be transformed mechanically to yield maximum degrees of coarse- and/or fine-grain parallelism. The efficient heuristics can find the maximum degrees of parallelism for loops whose nesting level is less than five.