An affine partitioning algorithm to maximize parallelism and minimize communication

Authors:
Amy W. Lim;Gerald I. Cheong;Monica S. Lam
Affiliations:
Computer Systems Laboratory, Stanford University, Stanford, CA;Microsoft Research in Seattle, Washington and Computer Systems Laboratory, Stanford University, Stanford, CA;Computer Systems Laboratory, Stanford University, Stanford, CA
Venue:
ICS '99 Proceedings of the 13th international conference on Supercomputing
Year:
1999

Citing 19
Cited 55

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Theory of linear and integer programming

Theory of linear and integer programming
Automatic decomposition of scientific programs for parallel execution

POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Optimizing for parallelism and data locality

ICS '92 Proceedings of the 6th international conference on Supercomputing
Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Improving locality and parallelism in nested loops

Improving locality and parallelism in nested loops
Some efficient solutions to the affine scheduling problem: I. One-dimensional time

International Journal of Parallel Programming
Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors

ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler optimizations for eliminating barrier synchronization

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Minimizing communication while preserving parallelism

ICS '96 Proceedings of the 10th international conference on Supercomputing
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Maximizing parallelism and minimizing synchronization with affine transforms

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Maximizing parallelism and minimizing synchronization with affine partitions

Parallel Computing - Special issues on languages and compilers for parallel computers
Loop Transformations for Restructuring Compilers: The Foundations

Loop Transformations for Restructuring Compilers: The Foundations
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Dependence graphs and compiler optimizations

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Determining Transformation Sequences for Loop Parallelization

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing

Blocking and array contraction across arbitrarily nested loops using affine partitioning

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Automatic data and computation decomposition on distributed memory parallel computers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Skewed Data Partition and Alignment Techniques for Compiling Programs on Distributed Memory Multicomputers

The Journal of Supercomputing
Communication-Free Alignment for Array References with Linear Subscripts in Three Loop Index Variables or Quadratic Subscripts

The Journal of Supercomputing
Communication Optimization for Affine Recurrence Equations Using Broadcast and Locality

International Journal of Parallel Programming
A Parallelization Domain Oriented Multilevel Graph Partitioner

IEEE Transactions on Computers
Skewed Data Partition and Alignment Techniques for Compiling Programs on Distributed Memory Multicomputers

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Static Coarse Grain Task Scheduling with Cache Optimization Using OpenMP

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Using Elementary Linear Algebra to Solve Data Alignment for Arrays with Linear or Quadratic References

IEEE Transactions on Parallel and Distributed Systems
Transforming Complex Loop Nests for Locality

The Journal of Supercomputing
Static coarse grain task scheduling with cache optimization using OpenMP

International Journal of Parallel Programming - Special issue: OpenMP: Experiences and implementations
A Compiler Analysis of Interprocedural Data Communication

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Automatic blocking of QR and LU factorizations for locality

MSP '04 Proceedings of the 2004 workshop on Memory system performance
Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion

International Journal of High Performance Computing Applications
A consistent generation of pipeline parallelism and distribution of operations and data among processors

Programming and Computing Software
Reuse analysis of indirectly indexed arrays

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Optimizing code parallelization through a constraint network based approach

Proceedings of the 43rd annual Design Automation Conference
Auto-CFD-NOW: A pre-compiler for effectively parallelizing CFD applications on networks of workstations

The Journal of Supercomputing
Automatic mapping of nested loops to FPGAS

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Maximize Parallelism Minimize Overhead for Nested Loops via Loop Striping

Journal of VLSI Signal Processing Systems
Combining symbolic execution with model checking to verify parallel numerical programs

ACM Transactions on Software Engineering and Methodology (TOSEM)
Finding free schedules for parameterized loops with affine dependences represented with a single dependence relation

AIC'05 Proceedings of the 5th WSEAS International Conference on Applied Informatics and Communications
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A domain specific interconnect for reconfigurable computing

Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Finding Synchronization-Free Parallelism Represented with Trees of Dependent Operations

ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
Finding Synchronization-Free Slices of Operations in Arbitrarily Nested Loops

ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Transformations techniques for extracting parallelism in non-uniform nested loops

WSEAS Transactions on Computers
Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Affine and unimodular transformations for non-uniform nested loops

ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
Parametric multi-level tiling of imperfectly nested loops

Proceedings of the 23rd international conference on Supercomputing
On control signals for multi-dimensional time

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Finding coarse grained parallelism in computational geometry algorithms

ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartIII
Coarse grain task parallel processing with cache optimization on shared memory multiprocessor

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model

CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
Extracting both affine and non-linear synchronization-free slices in program loops

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Code scheduling for optimizing parallelism and data locality

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Data locality and parallelism optimization using a constraint-based approach

Journal of Parallel and Distributed Computing
Locality optimization of stencil applications using data dependency graphs

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Compiler-directed scratch pad memory optimization for embedded multiprocessors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2002 international symposium on low-power electronics and design (ISLPED)
Practical loop transformations for tensor contraction expressions on multi-level memory hierarchies

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Coarse-grained loop parallelization: Iteration Space Slicing vs affine transformations

Parallel Computing
Combined loop transformation and hierarchy allocation for data reuse optimization

Proceedings of the International Conference on Computer-Aided Design
Performance of OSCAR multigrain parallelizing compiler on SMP servers

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
On-chip cache hierarchy-aware tile scheduling for multicore machines

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Communication-free data alignment for arrays with exponential references in parallelizing compilers for scalable parallel systems

The Journal of Supercomputing
Implications of electronics technology trends to algorithm design

VoCS'08 Proceedings of the 2008 international conference on Visions of Computer Science: BCS International Academic Conference
Optimizing memory hierarchy allocation with loop transformations for high-level synthesis

Proceedings of the 49th Annual Design Automation Conference
POET: a scripting language for applying parameterized source-to-source program transformations

Software—Practice & Experience
Automatic privatization for parallel execution of loops

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
Using free scheduling for programming graphic cards

Facing the Multicore-Challenge II
Free scheduling for statement instances of parameterized arbitrarily nested affine loops

Parallel Computing
Tiling stencil computations to maximize parallelism

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Improving high level synthesis optimization opportunity through polyhedral transformations

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Compiling affine loop nests for distributed-memory parallel architectures

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Revisiting loop fusion in the polyhedral framework

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.01

An affine partitioning algorithm to maximize parallelism and minimize communication

Quantified Score

Visualization

Abstract