Global optimizations for parallelism and locality on scalable parallel machines

Authors:
Jennifer M. Anderson;Monica S. Lam
Affiliations:
-;-
Venue:
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Year:
1993

Citing 30
Cited 110

Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
A global approach to detection of parallelism

A global approach to detection of parallelism
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A parallelizing compiler for distributed memory parallel computers

A parallelizing compiler for distributed memory parallel computers
Data optimization: allocation of arrays to reduce communication on SIMD machines

Journal of Parallel and Distributed Computing - Massively parallel computation
Supporting shared data structures on distributed memory architectures

PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Optimization of array accesses by collective loop transformations

ICS '91 Proceedings of the 5th international conference on Supercomputing
Loop partitioning for distributed memory multiprocessors as unimodular transformations

ICS '91 Proceedings of the 5th international conference on Supercomputing
Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
A static performance estimator to guide data partitioning decisions

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Generating explicit communication from shared-memory program references

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Optimal expression evaluation for data parallel architectures

Journal of Parallel and Distributed Computing
Automatic data mapping for distributed-memory parallel computers

Automatic data mapping for distributed-memory parallel computers
The complexity of multiway cuts (extended abstract)

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
The Stanford Dash Multiprocessor

Computer
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
Deriving good transformations for mapping nested loops on hierarchical parallel machines in polynomial time

ICS '92 Proceedings of the 6th international conference on Supercomputing
Optimizing for parallelism and data locality

ICS '92 Proceedings of the 6th international conference on Supercomputing
Communication optimization and code generation for distributed memory machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Automatic array alignment in data-parallel programs

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Accurate analysis of array references

Accurate analysis of array references
Improving locality and parallelism in nested loops

Improving locality and parallelism in nested loops
An optimizing Fortran D compiler for MIMD distributed-memory machines

An optimizing Fortran D compiler for MIMD distributed-memory machines
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Reduction of Cache Coherence Overhead by Compiler Data Layout and Loop Transformation

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Communication-Free Hyperplane Partitioning of Nested Loops

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Collective Loop Fusion for Array Contraction

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing

Communication optimization and code generation for distributed memory machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Twisted data layout

ICS '94 Proceedings of the 8th international conference on Supercomputing
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Optimal evaluation of array expressions on massively parallel machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Supporting dynamic data structures on distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Compiler optimizations for eliminating barrier synchronization

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatic data layout for high performance Fortran

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Unified compilation techniques for shared and distributed address space machines

ICS '95 Proceedings of the 9th international conference on Supercomputing
Mappings for communication minimization using distribution and alignment

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Evaluating the impact of advanced memory systems on compiler-parallelized codes

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
The influence of caches on the performance of heaps

Journal of Experimental Algorithmics (JEA)
Compiler-directed page coloring for multiprocessors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Minimizing communication while preserving parallelism

ICS '96 Proceedings of the 10th international conference on Supercomputing
Data-localization for Fortran macro-dataflow computation using partial static task assignment

ICS '96 Proceedings of the 10th international conference on Supercomputing
Characterizing the Memory Behavior of Compiler-Parallelized Applications

IEEE Transactions on Parallel and Distributed Systems
Loop Transformations for Fault Detection in Regular Loops on Massively Parallel Systems

IEEE Transactions on Parallel and Distributed Systems
Dynamic feedback: an effective technique for adaptive computing

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Data distribution support on distributed shared memory multiprocessors

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Efficient Algorithms for Data Distribution on Distributed Memory Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
Maximizing parallelism and minimizing synchronization with affine transforms

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Dynamic pointer alignment: tiling and communication optimizations for parallel pointer-based computations

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
A unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations

Proceedings of the fifth workshop on I/O in parallel and distributed systems
Tolerating latency in multiprocessors through compiler-inserted prefetching

ACM Transactions on Computer Systems (TOCS)
A user level program transformation tool

ICS '98 Proceedings of the 12th international conference on Supercomputing
Automatic data layout for distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts

IEEE Transactions on Parallel and Distributed Systems
An affine partitioning algorithm to maximize parallelism and minimize communication

ICS '99 Proceedings of the 13th international conference on Supercomputing
Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback

ACM Transactions on Computer Systems (TOCS)
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Simultaneous reference allocation in code generation for dual data memory bank ASIPs

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Statement-Level Communication-Free Partitioning Techniques for Parallelizing Compilers

The Journal of Supercomputing
Deriving Array Distributions by Optimization Techniques

The Journal of Supercomputing
Cacheminer: A Runtime Approach to Exploit Cache Locality on SMP

IEEE Transactions on Parallel and Distributed Systems
A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations

IEEE Transactions on Parallel and Distributed Systems
Dynamic data distribution with control flow analysis

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A Framework for Integrating Data Alignment, Distribution, and Redistribution in Distributed Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
A synthesis of memory mechanisms for distributed architectures

ICS '01 Proceedings of the 15th international conference on Supercomputing
Contention elimination by replication of sequential sections in distributed shared memory programs

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Accurate data redistribution cost estimation in software distributed shared memory systems

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Optimal tiling for minimizing communication in distributed shared-memory multiprocessors

Compiler optimizations for scalable parallel systems
Communication-free partitioning of nested loops

Compiler optimizations for scalable parallel systems
Solving alignment using elementary linear algebra

Compiler optimizations for scalable parallel systems
A compilation method for communication—efficient partitioning of DOALL loops

Compiler optimizations for scalable parallel systems
Compiler optimization of dynamic data distributions for distributed-memory multicomputers

Compiler optimizations for scalable parallel systems
Supporting dynamic data structures with Olden

Compiler optimizations for scalable parallel systems
Automatic Compilation of Loops to Exploit Operator Parallelism on Configurable Arithmetic Logic Units

IEEE Transactions on Parallel and Distributed Systems
A framework for performance-based program partitioning

Progress in computer research
Compiling parallel code for sparse matrix applications

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Skewed Data Partition and Alignment Techniques for Compiling Programs on Distributed Memory Multicomputers

The Journal of Supercomputing
An Advanced Compiler Framework for Non-Cache-Coherent Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
A framework for performance-based program partitioning

Progress in computer research
Improving the performance of DSM systems via compiler involvement

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
A Robust Compile Time Method for SchedulingTask Parallelism on Distributed Memory Machines

The Journal of Supercomputing
Compiler Support for Array Distribution onNUMA Shared Memory Multiprocessors

The Journal of Supercomputing
Communication Optimization for Affine Recurrence Equations Using Broadcast and Locality

International Journal of Parallel Programming
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Trends in Shared Memory Multiprocessing

Computer
Multiprocessors from a Software Perspective

IEEE Micro
Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Region Analysis: A Parallel Elimination Method for Data Flow Analysis

IEEE Transactions on Software Engineering
Segmented Alignment: An Enhanced Model to Align Data Parallel Programs of HPF

The Journal of Supercomputing
How to Optimize Residual Communications?

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Performance Modeling and Composition: A Case Study in Cell Simulation

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
An Adaptive Approach to Data Placement

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Efficient Support for Two-Dimensional Data Distributions in Distributed Shared Memory Systems

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Fortran RED - A Retargetable Environment for Automatic Data Layout

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Automatic Analysis of Loops to Exploit Operator Parallelism on Reconfigurable Systems

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
An Automatic Iteration/Data Distribution Method Based on Access Descriptors for DSMM

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Skewed Data Partition and Alignment Techniques for Compiling Programs on Distributed Memory Multicomputers

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Scheduling the Computations of a Loop Nest with Respect to a Given Mapping

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Automatic Data Layout Using 0-1 Integer Programming

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
A Constraint Optimization Framework for Mapping a Digital Signal Processing Application onto a Parallel Architecture

CP '01 Proceedings of the 7th International Conference on Principles and Practice of Constraint Programming
Data Flow Analysis Driven Dynamic Data Partitioning

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Optimal task scheduling at run time to exploit intra-tile parallelism

Parallel Computing
Optimization of Data Distribution and Processor Allocation Problem Using Simulated Annealing

The Journal of Supercomputing
Automatic decomposition in EPPP compiler

CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
Automatic data mapping of signal processing applications

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Three-dimensional orthogonal tile sizing problem: mathematical programming approach

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
An interprocedural framework for determining efficient data redistributions in distributed memory machines

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Using cache optimizing compiler for managing software cache on distributed shared memory system

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Compiler Techniques for the Distribution of Data and Computation

IEEE Transactions on Parallel and Distributed Systems
Mapping of Affine Loop Nests onto Independent Processors

Cybernetics and Systems Analysis
Linear data distribution based on index analysis

High performance scientific and engineering computing
A data locality optimizing algorithm

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Compact DAG representation and its symbolic scheduling

Journal of Parallel and Distributed Computing
Dyn-MPI: Supporting MPI on Non Dedicated Clusters

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
The MHETA Execution Model for Heterogeneous Clusters

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Optimizing compiler for shared-memory multiple SIMD architecture

Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Automatic code generation of data decomposition

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Instruction scheduling for a tiled dataflow architecture

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
The rise and fall of High Performance Fortran: an historical object lesson

Proceedings of the third ACM SIGPLAN conference on History of programming languages
Memetic algorithms for parallel code optimization

International Journal of Parallel Programming
Maximize Parallelism Minimize Overhead for Nested Loops via Loop Striping

Journal of VLSI Signal Processing Systems
MPSoC memory optimization using program transformation

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Lightweight barrier-based parallelization support for non-cache-coherent MPSoC platforms

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Tile Reduction: The First Step towards Tile Aware Parallelization in OpenMP

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Applying Data Mapping Techniques to Vector DSPs

Journal of Signal Processing Systems
Slicing based code parallelization for minimizing inter-processor communication

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Automatic memory partitioning and scheduling for throughput and power optimization

Proceedings of the 2009 International Conference on Computer-Aided Design
On the interaction of tiling and automatic parallelization

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Data locality and parallelism optimization using a constraint-based approach

Journal of Parallel and Distributed Computing
Parallelization of DNA sequence alignment using OpenMP

Proceedings of the 2011 International Conference on Communication, Computing & Security
PLDS: Partitioning linked data structures for parallelism

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Loop striping: maximize parallelism for nested loops

EUC'06 Proceedings of the 2006 international conference on Embedded and Ubiquitous Computing
Optimization of dense matrix multiplication on IBM cyclops-64: challenges and experiences

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Memory partitioning and scheduling co-optimization in behavioral synthesis

Proceedings of the International Conference on Computer-Aided Design
Compiling affine loop nests for distributed-memory parallel architectures

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Global optimizations for parallelism and locality on scalable parallel machines

Quantified Score

Visualization

Abstract