Optimizing Supercompilers for Supercomputers

Authors:
Michael Joseph Wolfe
Affiliations:
-
Venue:
Optimizing Supercompilers for Supercomputers
Year:
1990

Citing 0
Cited 228

On data synchronization for multiprocessors

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
GTS: parallelization and vectorization of tight recurrences

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Refined Fortran: an update

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A methodology for parallelizing programs for multicomputers and complex memory multiprocessors

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
An approach to ordering optimizing transformations

PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Run-Time Parallelization and Scheduling of Loops

IEEE Transactions on Computers
Vectorization and parallelization of irregular problems via graph coloring

ICS '91 Proceedings of the 5th international conference on Supercomputing
Automatic transformation of FORTRAN loops to reduce cache conflicts

ICS '91 Proceedings of the 5th international conference on Supercomputing
Optimization of array accesses by collective loop transformations

ICS '91 Proceedings of the 5th international conference on Supercomputing
Semantical interprocedural parallelization: an overview of the PIPS project

ICS '91 Proceedings of the 5th international conference on Supercomputing
Experiences with data dependence abstractions

ICS '91 Proceedings of the 5th international conference on Supercomputing
Extending the I test to direction vectors

ICS '91 Proceedings of the 5th international conference on Supercomputing
Uniform techniques for loop optimization

ICS '91 Proceedings of the 5th international conference on Supercomputing
PATCH—a new algorithm for rapid incremental dependence analysis

ICS '91 Proceedings of the 5th international conference on Supercomputing
A unified framework for systematic loop transformations

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Removal of redundant dependences in DOACROSS loops with constant dependences

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Generating explicit communication from shared-memory program references

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Performing data flow analysis in parallel

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Subdomain dependence test for massive parallelism

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Efficient and exact data dependence analysis

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Practical dependence testing

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Size and access inference for data-parallel programs

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Fortran at ten gigaflops: the connection machine convolution compiler

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Techniques for debugging parallel programs with flowback analysis

ACM Transactions on Programming Languages and Systems (TOPLAS)
Debugging parallelized code using code liberation techniques

PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
The Omega test: a fast and practical integer programming algorithm for dependence analysis

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiler optimizations for Fortran D on MIMD distributed-memory machines

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Tiling multidimensional iteration spaces for nonshared memory machines

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Retire Fortran? A debate rekindled

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Seismic modeling at 14 gigaflops on the connection machine

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Interprocedural transformations for parallel code generation

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Instruction-level parallelism in Prolog: analysis and architectural support

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Automatic partitioning of a program dependence graph into parallel tasks

IBM Journal of Research and Development
Eliminating false data dependences using the Omega test

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A general framework for iteration-reordering loop transformations

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A comprehensive approach to parallel data flow analysis

ICS '92 Proceedings of the 6th international conference on Supercomputing
On exact data dependence analysis

ICS '92 Proceedings of the 6th international conference on Supercomputing
Optimizing for parallelism and data locality

ICS '92 Proceedings of the 6th international conference on Supercomputing
A transformational approach to compiling Sisal for distributed memory architectures

ICS '92 Proceedings of the 6th international conference on Supercomputing
Access normalization: loop restructuring for NUMA compilers

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Non-unimodular transformations of nested loops

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Low copy message passing on the Alliant CAMPUS/800

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Chores: enhanced run-time support for shared-memory parallel computing

ACM Transactions on Computer Systems (TOCS)
Interprocedural modification side effect analysis with pointer aliasing

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A practical data flow framework for array reference analysis and its use in optimizations

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Access normalization: loop restructuring for NUMA computers

ACM Transactions on Computer Systems (TOCS)
Compiling machine-independent parallel programs

ACM SIGPLAN Notices
CMAX: a Fortran translator for the connection machine system

ICS '93 Proceedings of the 7th international conference on Supercomputing
Partitioning the statement per iteration space using non-singular matrices

ICS '93 Proceedings of the 7th international conference on Supercomputing
Compilation techniques for sparse matrix computations

ICS '93 Proceedings of the 7th international conference on Supercomputing
Partitioning the global space for distributed memory systems

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Advanced compiler optimizations for sparse computations

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Data flow analysis for parallel programs

CSC '93 Proceedings of the 1993 ACM conference on Computer science
Compiling nested data-parallel programs for shared-memory multiprocessors

ACM Transactions on Programming Languages and Systems (TOPLAS)
Unified compilation of Fortran 77D and 90D

ACM Letters on Programming Languages and Systems (LOPLAS)
Exploiting the parallelism available in loops

Computer
Program optimization and parallelization using idioms

ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallelizing complex scans and reductions

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
The privatizing DOALL test: a run-time technique for DOALL loop identification and array privatization

ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler techniques for maximizing fine-grain and coarse-grain parallelism in loops with uniform dependences

ICS '94 Proceedings of the 8th international conference on Supercomputing
Exploiting cache affinity in software cache coherence

ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler and runtime support for out-of-core HPF programs

ICS '94 Proceedings of the 8th international conference on Supercomputing
Static analysis of upper and lower bounds on dependences and parallelism

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compilation of out-of-core data parallel programs for distributed memory machines

ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
Dynamic memory disambiguation for array references

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Fusing loops with backward inter loop data dependence

ACM SIGPLAN Notices
Instruction scheduling in the TOBEY compiler

IBM Journal of Research and Development
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
An extended form of must alias analysis for dynamic allocation

POPL '95 Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Supporting dynamic data structures on distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Advanced Array Optimizations for High Performance Functional Languages

IEEE Transactions on Parallel and Distributed Systems
Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor Clusters

IEEE Transactions on Parallel and Distributed Systems
The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
An empirical study of precise interprocedural array analysis

Scientific Programming
Flattening and parallelizing irregular, recurrent loop nests

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Extracting task-level parallelism

ACM Transactions on Programming Languages and Systems (TOPLAS)
Software pipelining

ACM Computing Surveys (CSUR)
Compiler cache optimizations for banded matrix problems

ICS '95 Proceedings of the 9th international conference on Supercomputing
Unified compilation techniques for shared and distributed address space machines

ICS '95 Proceedings of the 9th international conference on Supercomputing
Run-time methods for parallelizing partially parallel loops

ICS '95 Proceedings of the 9th international conference on Supercomputing
Optimal tile size adjustment in compiling general DOACROSS loop nests

ICS '95 Proceedings of the 9th international conference on Supercomputing
Vectorization beyond data dependences

ICS '95 Proceedings of the 9th international conference on Supercomputing
Practical approach to single assignment code

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Translation of serial recursive codes to parallel SIMD codes

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Valid Transformations: A New Class of Loop Transformations for High-Level Synthesis and Pipelined Scheduling Applications

IEEE Transactions on Parallel and Distributed Systems
Automatic Data Structure Selection and Transformation for Sparse Matrix Computations

IEEE Transactions on Parallel and Distributed Systems
Symbolic analysis for parallelizing compilers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Anticipatory instruction scheduling

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Detection and global optimization of reduction operations for distributed parallel machines

ICS '96 Proceedings of the 10th international conference on Supercomputing
Data-localization for Fortran macro-dataflow computation using partial static task assignment

ICS '96 Proceedings of the 10th international conference on Supercomputing
The future of program analysis

ACM Computing Surveys (CSUR) - Special issue: position statements on strategic directions in computing research
Achieving Full Parallelism Using Multidimensional Retiming

IEEE Transactions on Parallel and Distributed Systems
Loop Transformations for Fault Detection in Regular Loops on Massively Parallel Systems

IEEE Transactions on Parallel and Distributed Systems
Fusion of Loops for Parallelism and Locality

IEEE Transactions on Parallel and Distributed Systems
On the perfect accuracy of an approximate subscript analysis test

ICS '90 Proceedings of the 4th international conference on Supercomputing
Joint Minimization of Code and Data for Synchronous DataflowPrograms

Formal Methods in System Design
Optimal weighted loop fusion for parallel programs

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Maximizing parallelism and minimizing synchronization with affine transforms

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Alias analysis of executable code

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Array SSA form and its use in parallelization

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Simulation/evaluation environment for a VLIW processor architecture

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Loop fusion in high performance Fortran

ICS '98 Proceedings of the 12th international conference on Supercomputing
Constraint-based array dependence analysis

ACM Transactions on Programming Languages and Systems (TOPLAS)
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
An affine partitioning algorithm to maximize parallelism and minimize communication

ICS '99 Proceedings of the 13th international conference on Supercomputing
An efficient message-passing scheduler based on guided self scheduling

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Statically Safe Speculative Execution for Real-Time Systems

IEEE Transactions on Software Engineering
Optimized unrolling of nested loops

Proceedings of the 14th international conference on Supercomputing
Timing Analysis for Data and Wrap-Around Fill Caches

Real-Time Systems
From flop to megaflops: Java for technical computing

ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel Solutions of Simple Indexed Recurrence Equations

IEEE Transactions on Parallel and Distributed Systems
Register-sensitive selection, duplication, and sequencing of instructions

ICS '01 Proceedings of the 15th international conference on Supercomputing
Loop parallelization algorithms

Compiler optimizations for scalable parallel systems
Communication-free partitioning of nested loops

Compiler optimizations for scalable parallel systems
A schema for interprocedural modification side-effect analysis with pointer aliasing

ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic partitioning and virtual scheduling for efficient parallel execution

ACM-SE 30 Proceedings of the 30th annual Southeast regional conference
Efficient Parallel Execution of Irregular Recursive Programs

IEEE Transactions on Parallel and Distributed Systems
Compiler Support for Scalable and Efficient Memory Systems

IEEE Transactions on Computers
Compiling stencils in high performance Fortran

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
A compiler approach to fast hardware design space exploration in FPGA-based systems

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Immutability specification and its applications

JGI '02 Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande
Synthesis of Embedded Software from Synchronous Dataflow Specifications

Journal of VLSI Signal Processing Systems
Sunder: a programmable hardware prefetch architecture for numerical loops

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Expressing cross-loop dependencies through hyperplane data dependence analysis

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Enabling unimodular transformations

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Relative Debugging of Automatically Parallelized Programs

Automated Software Engineering
Index Set Splitting

International Journal of Parallel Programming
Achieving Scalable Locality with Time Skewing

International Journal of Parallel Programming
Removal of Redundant Dependences in DOACROSS Loops with Constant Dependences

IEEE Transactions on Parallel and Distributed Systems
Interactive Parallel Programming using the ParaScope Editor

IEEE Transactions on Parallel and Distributed Systems
The I Test: An Improved Dependence Test for Automatic Parallelization and Vectorization

IEEE Transactions on Parallel and Distributed Systems
Compiling Communication-Efficient Programs for Massively Parallel Machines

IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Compile-Time Techniques for Data Distribution in Distributed Memory Machines

IEEE Transactions on Parallel and Distributed Systems
The Power Test for Data Dependence

IEEE Transactions on Parallel and Distributed Systems
Dependence Uniformization: A Loop Parallelization Technique

IEEE Transactions on Parallel and Distributed Systems
Loop-Level Parallelism in Numeric and Symbolic Programs

IEEE Transactions on Parallel and Distributed Systems
Program Structuring for Effective Parallel Portability

IEEE Transactions on Parallel and Distributed Systems
Loop Coalescing and Scheduling for Barrier MIMD Architectures

IEEE Transactions on Parallel and Distributed Systems
The Direction Vector I Test

IEEE Transactions on Parallel and Distributed Systems
On Loop Transformations for Generalized Cycle Shrinking

IEEE Transactions on Parallel and Distributed Systems
Constructive Methods for Scheduling Uniform Loop Nests

IEEE Transactions on Parallel and Distributed Systems
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
The Classification, Fusion, and Parallelization of Array Language Primitives

IEEE Transactions on Parallel and Distributed Systems
Loop Transformation Using Nonunimodular Matrices

IEEE Transactions on Parallel and Distributed Systems
A General Methodology of Partitioning and Mapping for Given Regular Arrays

IEEE Transactions on Parallel and Distributed Systems
Efficient Pipelining of Nested Loops: Unroll-and-Squash

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
From Flop to MegaFlops: Java for Technical Computing

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Optimized Execution of Fortran 90 Array Language on Symmetric Shared-Memory Multiprocessors

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Dependence Analysis for Java

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Coarse-Grain Task Parallel Processing Using the OpenMP Backend of the OSCAR Multigrain Parallelizing Compiler

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Irregular Assignment Computations on cc-NUMA Multiprocessors

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Efficient Dependence Analysis for Java Arrays

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Interaction Between Data Parallel Compilation and Data Transfer and Storage Cost Minimization for Multimedia Applications

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Transformations on Doubly Nested Loops

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Efficient Execution of Doacross Loops on Distributed Memory Systems

PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Software Pipelining: Petri Net Pacemaker

PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Parallel Computation: MM +/- X

Informatics - 10 Years Back. 10 Years Ahead.
Techniques for Reducing the Overhead of Run-Time Parallelization

CC '00 Proceedings of the 9th International Conference on Compiler Construction
Advanced Scalarization of Array Syntax

CC '00 Proceedings of the 9th International Conference on Compiler Construction
A Technique for FPGA Synthesis Driven by Automatic Source Code Analysis and Transformations

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Loop Transformations for Hierarchical Parallelism and Locality

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Speculative Parallelization of Partially Parallel Loops

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Interaction between parallel compilation and data transfer and storage cost minimization for multimedia applications

Practical parallel computing
Cluster computing with message-passing interface

Highly parallel computaions
Optimized software synthesis for synchronous dataflow

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
A Loop Transformation for Maximizing Parallelism from Single Loops with Nonuniform Dependencies

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
A transformation method to reduce loop overhead in HPF compiler

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Simulation of aerodynamics problem on a distributed shared-memory machine

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Detection of Implicit Parallelisms in the Task Parallel Language

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
A New Transformation Method to Generate Optimized DO Loop from FORALL Construct

PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
References

Sourcebook of parallel computing
Automatic generation of application specific processors

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Local supercomputing training in the computational sciences using remote national centers

Future Generation Computer Systems - Special issue: Selected papers from the workshop on education in computational sciences held at the ICCS 2002
Transforming Complex Loop Nests for Locality

The Journal of Supercomputing
What can we gain by unfolding loops?

ACM SIGPLAN Notices
High performance air pollution modeling for a power plant environment

Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
High performance air pollution simulation on shared memory systems

High performance scientific and engineering computing
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
High Performance Air Pollution Simulation Using OpenMP

The Journal of Supercomputing
Interprocedural dependence analysis and parallelization

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Evaluating heuristics in automatically mapping multi-loop applications to FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
New Complexity Results on Array Contraction and Related Problems

Journal of VLSI Signal Processing Systems
Automatic blocking of QR and LU factorizations for locality

MSP '04 Proceedings of the 2004 workshop on Memory system performance
A novel approach for partitioning iteration spaces with variable densities

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion

International Journal of High Performance Computing Applications
Convergence debugging

Proceedings of the sixth international symposium on Automated analysis-driven debugging
Optimizing inter-processor data locality on embedded chip multiprocessors

Proceedings of the 5th ACM international conference on Embedded software
Efficient Techniques for Advanced Data Dependence Analysis

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Software integrity protection using timed executable agents

ASIACCS '06 Proceedings of the 2006 ACM Symposium on Information, computer and communications security
Test suite oscillations

Information Processing Letters
A general approach for partitioning N-dimensional parallel nested loops with conditionals

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
May-happen-in-parallel analysis of X10 programs

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
The rise and fall of High Performance Fortran: an historical object lesson

Proceedings of the third ACM SIGPLAN conference on History of programming languages
NUMACROS: data parallel programming on NUMA multiprocessors

Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

Scientific Programming
A Systematic Approach to Automatically Generate Multiple Semantically Equivalent Program Versions

Ada-Europe '08 Proceedings of the 13th Ada-Europe international conference on Reliable Software Technologies
Revisiting Cache Block Superloading

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors

Transactions on High-Performance Embedded Architectures and Compilers I
Cache-aware partitioning of multi-dimensional iteration spaces

SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Test suite oscillations

Information Processing Letters
Paper: A comparative study of automatic vectorizing compilers

Parallel Computing
Building the program parallelization system based on a very wide spectrum program transformation system

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartII
Polynomial time array dataflow analysis

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Verification by parallelization of parametric code

Algebraic and proof-theoretic aspects of non-classical logics
McFLAT: a profile-based framework for MATLAB loop analysis and transformations

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
How many threads to spawn during program multithreading?

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Applying data copy to improve memory performance of general array computations

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Parallelization of utility programs based on behavior phase analysis

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Automating verification of loops by parallelization

LPAR'06 Proceedings of the 13th international conference on Logic for Programming, Artificial Intelligence, and Reasoning
An inspector-executor algorithm for irregular assignment parallelization

ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
A geometric approach for partitioning n-dimensional non-rectangular iteration spaces

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Applying loop optimizations to object-oriented abstractions through general classification of array semantics

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Techniques for the parallelization of unstructured grid applications on multi-GPU systems

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Matrix-Based programming optimization for improving memory hierarchy performance on imagine

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Efficient parallel implementation of sequence analysis algorithms using a global address space model

Mathematical and Computer Modelling: An International Journal
Optimization techniques for efficient HTA programs

Parallel Computing
Layout-oblivious compiler optimization for matrix computations

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A survey of pipelined workflow scheduling: Models and algorithms

ACM Computing Surveys (CSUR)
Non-affine Extensions to Polyhedral Code Generation

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation

International Journal of Parallel Programming

Quantified Score

Hi-index	0.02

Visualization

Abstract

From the Publisher:Effective use of a supercomputer requires users to have a good algorithm and to express this algorithm in an appropriate language, and requires compilers to generate efficient code. This book investigates several problems facing compiler design for supercomputers, including building efficient and comprehensive data dependence graphs, recurrence relations, the management of compiler temporary variables, and WHILE loops. The book first proposes an efficient means of representing the flow of data in a program by labeling the arcs in a data dependence graph with "direction vectors" to show how the flow of data corresponds to the loop structure of the program. These data dependence direction vectors are then used in several high level compiler loop optimizations: loop vectorization, loop concurrentization, loop fusion, and loop interchanging. The book shows how to perform these transformations and how to use them to optimize programs for a wide range of supercomputers. The problems of recurrence relations studied include arithmetic recurrences with IF statements and recurrences involving both data and control dependence relations in a cycle. The wavefront method of solving recurrences is also treated. The book discusses ways to make the problem of managing temporary arrays more tractable. It concludes by offering several methods for executing WHILE loops and describes a general structure of an optimizing compiler for supercomputers developed from the author's experience with a test bed compiler. Michael Wolfe is Associate Professor in the Computer Science and Engineering Department at the Oregon Graduate Center Optimizing Supercompilers forSupercomputers is included in the series Research Monographs in Parallel Computing. Copublished with Pitman Publishing.