Maximizing parallelism and minimizing synchronization with affine transforms

Authors:
Amy W. Lim;Monica S. Lam
Affiliations:
Computer Systems Laboratory, Stanford University, Stanford, CA;Computer Systems Laboratory, Stanford University, Stanford, CA
Venue:
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Year:
1997

Citing 20
Cited 55

Theory of linear and integer programming

Theory of linear and integer programming
Automatic decomposition of scientific programs for parallel execution

POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
A general framework for iteration-reordering loop transformations

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Optimizing for parallelism and data locality

ICS '92 Proceedings of the 6th international conference on Supercomputing
A framework for unifying reordering transformations

A framework for unifying reordering transformations
Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Partitioning the statement per iteration space using non-singular matrices

ICS '93 Proceedings of the 7th international conference on Supercomputing
Communication-free hyperplane partitioning of nested loops

Journal of Parallel and Distributed Computing
Determining schedules based on performance estimation

Determining schedules based on performance estimation
Improving locality and parallelism in nested loops

Improving locality and parallelism in nested loops
Some efficient solutions to the affine scheduling problem: I. One-dimensional time

International Journal of Parallel Programming
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Minimizing communication while preserving parallelism

ICS '96 Proceedings of the 10th international conference on Supercomputing
Loop Transformations for Restructuring Compilers: The Foundations

Loop Transformations for Restructuring Compilers: The Foundations
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Communication-Free Parallelization via Affine Transformations

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Solving Alignment Using Elementary Linear Algebra

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing

An affine partitioning algorithm to maximize parallelism and minimize communication

ICS '99 Proceedings of the 13th international conference on Supercomputing
A Loop Transformation Algorithm for Communication Overlapping

International Journal of Parallel Programming - Special issue on international symposium on high performance computing 1997, part I
A unified framework for schedule and storage optimization

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Exact analysis of the cache behavior of nested loops

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Blocking and array contraction across arbitrarily nested loops using affine partitioning

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Loop parallelization algorithms

Compiler optimizations for scalable parallel systems
Communication Optimization for Affine Recurrence Equations Using Broadcast and Locality

International Journal of Parallel Programming
High Level Software Synthesis of Affine Iterative Algorithms onto Parallel Architectures

HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Complexity of Multi-dimensional Loop Alignment

STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Scheduling the Computations of a Loop Nest with Respect to a Given Mapping

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Symbolic Communication Set Generation for Irregular Parallel Applications

The Journal of Supercomputing
Using Graph Models in Retargetable Optimizing Compilers for Microprocessors with VLIW Architectures

Cybernetics and Systems Analysis
A data locality optimizing algorithm

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Applications of storage mapping optimization to register promotion

Proceedings of the 18th annual international conference on Supercomputing
Facilitating the search for compositions of program transformations

Proceedings of the 19th annual international conference on Supercomputing
Stream Programming on General-Purpose Processors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Data and Computation Transformations for Brook Streaming Applications on Multiprocessors

Proceedings of the International Symposium on Code Generation and Optimization
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
Function level parallelism driven by data dependencies

ACM SIGARCH Computer Architecture News
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time

Proceedings of the International Symposium on Code Generation and Optimization
Maximize Parallelism Minimize Overhead for Nested Loops via Loop Striping

Journal of VLSI Signal Processing Systems
Executing irregular scientific applications on stream architectures

Proceedings of the 21st annual international conference on Supercomputing
A memory-conscious code parallelization scheme

Proceedings of the 44th annual Design Automation Conference
Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Extracting synchronization-free threads in perfectly nested loops using the omega project software

SEPADS'05 Proceedings of the 4th WSEAS International Conference on Software Engineering, Parallel & Distributed Systems
Finding free schedules for parameterized loops with affine dependences represented with a single dependence relation

AIC'05 Proceedings of the 5th WSEAS International Conference on Applied Informatics and Communications
A compiler framework for optimization of affine loop nests for gpgpus

Proceedings of the 22nd annual international conference on Supercomputing
Iterative optimization in the polyhedral model: part ii, multidimensional time

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Transformations techniques for extracting parallelism in non-uniform nested loops

WSEAS Transactions on Computers
Affine and unimodular transformations for non-uniform nested loops

ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
Tile Reduction: The First Step towards Tile Aware Parallelization in OpenMP

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Extracting synchronization-free slices of operations in perfectly-nested loops

PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Automating the generation of composed linear algebra kernels

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Finding synchronization-free parallelism for non-uniform loops

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartII
Finding coarse grained parallelism in computational geometry algorithms

ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartIII
A profile-based tool for finding pipeline parallelism in sequential programs

Parallel Computing
Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
The Paralax infrastructure: automatic parallelization with a helping hand

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Energy- and Performance-Efficient Communication Framework for Embedded MPSoCs through Application-Driven Release Consistency

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Automatic code generation for distributed memory architectures in the polytope model

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Parallelization of DNA sequence alignment using OpenMP

Proceedings of the 2011 International Conference on Communication, Computing & Security
McFLAT: a profile-based framework for MATLAB loop analysis and transformations

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
A programming language interface to describe transformations and code generation

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Automatic generation of fpga-specific pipelined accelerators

ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Optimization of dense matrix multiplication on IBM cyclops-64: challenges and experiences

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Loop transformation recipes for code generation and auto-tuning

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Synchronization-Free automatic parallelization: beyond affine iteration-space slicing

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Polyhedral code generation in the real world

CC'06 Proceedings of the 15th international conference on Compiler Construction
Predictive modeling in a polyhedral optimization space

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
On-chip cache hierarchy-aware tile scheduling for multicore machines

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Code generation for parallel execution of a class of irregular loops on distributed memory systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
FPGA-specific synthesis of loop-nests with pipelined computational cores

Microprocessors & Microsystems
When polyhedral transformations meet SIMD code generation

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Integrating profile-driven parallelism detection and machine-learning-based mapping

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Maximizing parallelism and minimizing synchronization with affine transforms

Quantified Score

Visualization

Abstract