Compiler transformations for high-performance computing

Authors:
David F. Bacon;Susan L. Graham;Oliver J. Sharp
Affiliations:
Computer Science Division, University of California, Berkeley, California;Computer Science Division, University of California, Berkeley, California;Computer Science Division, University of California, Berkeley, California
Venue:
ACM Computing Surveys (CSUR)
Year:
1994

Citing 156
Cited 192

Distributed execution of functional programs using serial combinators

IEEE Transactions on Computers
An empirical study of automatic restructuring of nonnumerical programs for parallel processors

IEEE Transactions on Computers
Strictness analysis—a practical approach

Proc. of a conference on Functional programming languages and computer architecture
Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Multiplication by Integer constants

Software—Practice & Experience
Interprocedural constant propagation

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Interprocedural dependence analysis and parallelization

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Direct parallelization of call statements

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
ORBIT: an optimizing compiler for scheme

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Revised report on the algorithmic language scheme

ACM SIGPLAN Notices
Dataflow architectures

Annual review of computer science vol. 1, 1986
Structure and interpretation of computer programs

Structure and interpretation of computer programs
Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Memory storage patterns in parallel processing

Memory storage patterns in parallel processing
Superoptimizer: a look at the smallest program

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
A VLIW architecture for a trace Scheduling Compiler

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Estimating interlock and improving balance for pipelined architectures

Journal of Parallel and Distributed Computing
Analysis of interprocedural side effects in a parallel programming environment

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Loop quantization: a generalized loop unwinding technique

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
An overview for the PTRAN analysis system for multiprocessing

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Minimizing register usage penalty at procedure calls

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Optimal loop parallelization

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Principles of runtime support for parallel processors

ICS '88 Proceedings of the 2nd international conference on Supercomputing
A framework for determining useful parallelism

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Array expansion

ICS '88 Proceedings of the 2nd international conference on Supercomputing
The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
Compiling issues for supercomputers

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Advanced loop optimizations for parallel computers

Proceedings of the 1st International Conference on Supercomputing
Customization: optimizing compiler technology for SELF, a dynamically-typed object-oriented programming language

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Evaluating the performance of four snooping cache coherency protocols

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Fast interprocedual alias analysis

POPL '89 Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Scans as Primitive Parallel Operations

IEEE Transactions on Computers
The parascope editor: an interactive parallel programming tool

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A comparison study of automatically vectorizing Fortran compilers

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Data optimization: allocation of arrays to reduce communication on SIMD machines

Journal of Parallel and Distributed Computing - Massively parallel computation
A mechanism for keeping useful internal information in parallel programming tools: the data access descriptor

Journal of Parallel and Distributed Computing - Special issue: software tools for parallel programming and visualization
Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers

IEEE Transactions on Computers
A Survey of Cache Coherence Schemes for Multiprocessors

Computer
The priority-based coloring approach to register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Updating distributed variables in local computations

Concurrency: Practice and Experience
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
How to read floating point numbers accurately

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
How to print floating-point numbers accurately

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Compilation of Haskell array comprehensions for scientific computing

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Constant propagation with conditional branches

ACM Transactions on Programming Languages and Systems (TOPLAS)
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Strip mining on SIMD architectures

ICS '91 Proceedings of the 5th international conference on Supercomputing
Uniform techniques for loop optimization

ICS '91 Proceedings of the 5th international conference on Supercomputing
An experiment with inline substitution

Software—Practice & Experience
Loop distribution with arbitrary control flow

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Efficient and exact data dependence analysis

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Practical dependence testing

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Procedure merging with instruction caches

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Fortran at ten gigaflops: the connection machine convolution compiler

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Single instruction stream parallelism is greater than two

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Dynamic Processor Self-Scheduling for General Parallel Nested Loops

IEEE Transactions on Computers
Intelligent program optimization and parallelization for parallel computers

Intelligent program optimization and parallelization for parallel computers
Interprocedural transformations for parallel code generation

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiling programs for nonshared memory machines

Compiling programs for nonshared memory machines
Compiling with continuations

Compiling with continuations
Alpha architecture reference manual

Alpha architecture reference manual
Unexpected side effects of inline substitution: a case study

ACM Letters on Programming Languages and Systems (LOPLAS)
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
A practical algorithm for exact array dependence analysis

Communications of the ACM
New CPU benchmark suites from SPEC

COMPCON '92 Proceedings of the thirty-seventh international conference on COMPCON
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A general framework for iteration-reordering loop transformations

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Relaxing SIMD control flow constraints using loop transformations

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A dynamic scheduling method for irregular parallel programs

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Eliminating branches using a superoptimizer and the GNU C compiler

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Automatic data mapping for distributed-memory parallel computers

ICS '92 Proceedings of the 6th international conference on Supercomputing
Array privatization for parallel execution of loops

ICS '92 Proceedings of the 6th international conference on Supercomputing
Design of the IBM System/390 computer family for numerically intensive applications: an overview for engineers and scientists

IBM Journal of Research and Development
Interprocedural modification side effect analysis with pointer aliasing

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Orchestrating interactions among parallel computations

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Cray Y-MP C90: system features and early benchmark results

Parallel Computing
Array-data flow analysis and its use in array privatization

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic array alignment in data-parallel programs

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Static and dynamic evaluation of data dependence analysis

ICS '93 Proceedings of the 7th international conference on Supercomputing
CMAX: a Fortran translator for the connection machine system

ICS '93 Proceedings of the 7th international conference on Supercomputing
Automatic data partitioning on distributed memory multicomputers

Automatic data partitioning on distributed memory multicomputers
Preliminary experiences with the Fortran D compiler

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Improving locality and parallelism in nested loops

Improving locality and parallelism in nested loops
Instruction-level parallel processing: history, overview, and perspective

The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Automatic data allocation to minimize communication on SIMD machines

The Journal of Supercomputing
An optimizing Fortran D compiler for MIMD distributed-memory machines

An optimizing Fortran D compiler for MIMD distributed-memory machines
Memory-hierarchy management

Memory-hierarchy management
An Algorithm for Translating Boolean Expressions

Journal of the ACM (JACM)
ALPHA—An Automatic Programming System of High Efficiency

Journal of the ACM (JACM)
A Transformation System for Developing Recursive Programs

Journal of the ACM (JACM)
Program Improvement by Source-to-Source Transformation

Journal of the ACM (JACM)
Code Generation for Expressions with Common Subexpressions

Journal of the ACM (JACM)
A Survey of Parallel Machine Organization and Programming

ACM Computing Surveys (CSUR)
Experience with the SETL Optimizer

ACM Transactions on Programming Languages and Systems (TOPLAS)
Global optimization by suppression of partial redundancies

Communications of the ACM
The CRAY-1 computer system

Communications of the ACM - Special issue on computer architecture
Assembling code for machines with span-dependent instructions

Communications of the ACM
An analysis of inline substitution for a structured programming language

Communications of the ACM
The parallel execution of DO loops

Communications of the ACM
Parallel processing: a smart compiler and a dumb machine

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Automatic loop interchange

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
A unified approach to global program optimization

POPL '73 Proceedings of the 1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Partitioning and Scheduling Parallel Programs for Multiprocessors

Partitioning and Scheduling Parallel Programs for Multiprocessors
Parallel Programming and Compilers

Parallel Programming and Compilers
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Dependence graphs and compiler optimizations

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A precise inter-procedural data flow algorithm

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An efficient way to find the side effects of procedure calls and the aliases of variables

POPL '79 Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Program Flow Analysis: Theory and Application

Program Flow Analysis: Theory and Application
Structure of Computers and Computations

Structure of Computers and Computations
IBM RISC System/6000: Architecture and Performance

IEEE Micro
Architecture of the Pentium Microprocessor

IEEE Micro
False Sharing and Spatial Locality in Multiprocessor Caches

IEEE Transactions on Computers
An Efficient Data Dependence Analysis for Parallelizing Compilers

IEEE Transactions on Parallel and Distributed Systems
An Empirical Study of Fortran Programs for Parallelizing Compilers

IEEE Transactions on Parallel and Distributed Systems
Compiling Communication-Efficient Programs for Massively Parallel Machines

IEEE Transactions on Parallel and Distributed Systems
Compiling Global Name-Space Parallel Loops for Distributed Execution

IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
The Power Test for Data Dependence

IEEE Transactions on Parallel and Distributed Systems
Loop-Level Parallelism in Numeric and Symbolic Programs

IEEE Transactions on Parallel and Distributed Systems
Perfect Pipelining: A New Loop Parallelization Technique

ESOP '88 Proceedings of the 2nd European Symposium on Programming
Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Automatic Array Privatization

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
The Alignment-Distribution Graph

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness

CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
Predicting the effects of optimization on a procedure body

SIGPLAN '79 Proceedings of the 1979 SIGPLAN symposium on Compiler construction
Register allocation & spilling via graph coloring

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
The ILLIAC IV FORTRAN compiler

Proceedings of the conference on Programming languages and compilers for parallel and vector machines
The Paralyzer: Ivtran's Parallelism Analyzer and Synthesizer

Proceedings of the conference on Programming languages and compilers for parallel and vector machines
Fortran for the Texas Instruments ASC system

Proceedings of the conference on Programming languages and compilers for parallel and vector machines
Global common subexpression elimination

Proceedings of a symposium on Compiler optimization
Control and data dependence for program transformations.

Control and data dependence for program transformations.
Improving the performance of virtual memory computers.

Improving the performance of virtual memory computers.
Speedup of ordinary programs

Speedup of ordinary programs
Dependence analysis for subscripted variables and its application to program transformations

Dependence analysis for subscripted variables and its application to program transformations
Compiling for locality of reference

Compiling for locality of reference
Arithmetic shifting considered harmful

ACM SIGPLAN Notices
Programming languages and their compilers: Preliminary notes

Programming languages and their compilers: Preliminary notes
A programming language

A programming language

Abstract interpretation and low-level code optimization

PEPM '95 Proceedings of the 1995 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Improving instruction-level parallelism by loop unrolling and dynamic memory disambiguation

Proceedings of the 28th annual international symposium on Microarchitecture
Efficient and language-independent mobile programs

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Let-floating: moving bindings to give faster programs

Proceedings of the first ACM SIGPLAN international conference on Functional programming
Parallelizing compilers

ACM Computing Surveys (CSUR)
Analysis of benchmark characteristics and benchmark performance prediction

ACM Transactions on Computer Systems (TOCS)
Compiler-directed page coloring for multiprocessors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Address calculation for retargetable compilation and exploration of instruction-set architectures

DAC '96 Proceedings of the 33rd annual Design Automation Conference
Fusion of Loops for Parallelism and Locality

IEEE Transactions on Parallel and Distributed Systems
Low-power mapping of behavioral arrays to multiple memories

ISLPED '96 Proceedings of the 1996 international symposium on Low power electronics and design
A new optimization technique for improving resource exploitation and critical path minization

ISSS '97 Proceedings of the 10th international symposium on System synthesis
Maximizing parallelism and minimizing synchronization with affine transforms

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization primitives

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Potential-driven statistical ordering of transformations

DAC '97 Proceedings of the 34th annual Design Automation Conference
Manufacturing cheap, resilient, and stealthy opaque constructs

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
FACT: a framework for the application of throughput and power optimizing transformations to control-flow intensive behavioral descriptions

DAC '98 Proceedings of the 35th annual Design Automation Conference
A methodology for guided behavioral-level optimization

DAC '98 Proceedings of the 35th annual Design Automation Conference
Loop fusion in high performance Fortran

ICS '98 Proceedings of the 12th international conference on Supercomputing
On the Removal of Anti- and Output-Dependences

International Journal of Parallel Programming
Compact and efficient presentation conversion code

IEEE/ACM Transactions on Networking (TON)
Architecture-level dependence analysis in support of software maintenance

ISAW '98 Proceedings of the third international workshop on Software architecture
Schedule-independent storage mapping for loops

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Software watermarking: models and dynamic embeddings

Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
New tiling techniques to improve cache temporal locality

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Towards automated synthesis of data mining programs

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Accelerating APL programs with SAC

Proceedings of the conference on APL '99 : On track to the 21st century: On track to the 21st century
Effectivness of abstract interpretation in automatic parallelization: a case study in logic programming

ACM Transactions on Programming Languages and Systems (TOPLAS)
Power optimization using divide-and-conquer techniques for minimization of the number of operations

ACM Transactions on Design Automation of Electronic Systems (TODAES)
On defining application-specific high-level array operations by means of shape-invariant programming facilities

APL '98 Proceedings of the APL98 conference on Array processing language
Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization primitives

ACM Transactions on Computer Systems (TOCS)
A fuzzy approach to automatic data locality optimization

SAC '96 Proceedings of the 1996 ACM symposium on Applied Computing
Cache-optimal methods for bit-reversals

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Memory characteristics of iterative methods

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Protecting Java code via code obfuscation

Crossroads - Speical issue on robotics
Optimized unrolling of nested loops

Proceedings of the 14th international conference on Supercomputing
Programming languages and systems for prototyping concurrent applications

ACM Computing Surveys (CSUR)
Independence in CLP languages

ACM Transactions on Programming Languages and Systems (TOPLAS)
Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Exploiting non-uniform reuse for cache optimization

Proceedings of the 2001 ACM symposium on Applied computing
Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences

IEEE Transactions on Parallel and Distributed Systems
Computer aided hand tuning (CAHT): “applying case-based reasoning to performance tuning”

ICS '01 Proceedings of the 15th international conference on Supercomputing
Blocking and array contraction across arbitrarily nested loops using affine partitioning

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Loop fusion for memory space optimization

Proceedings of the 14th international symposium on Systems synthesis
Source code transformation based on software cost analysis

Proceedings of the 14th international symposium on Systems synthesis
Source code optimization and profiling of energy consumption in embedded systems

ISSS '00 Proceedings of the 13th international symposium on System synthesis
An empirical evaluation of high level transformations for embedded processors

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Energy aware compilation for DSPs with SIMD instructions

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Optimized Unrolling of Nested Loops

International Journal of Parallel Programming
Memory Design and Exploration for Low Power, Embedded Systems

Journal of VLSI Signal Processing Systems - Special issue on signal processing systems design and implementation
Relational programs: An architecture for robust real-time safety-critical process-control systems

Annals of Software Engineering
Handling Global Constraints in Compiler Strategy

International Journal of Parallel Programming
Compilation Techniques for Multimedia Processors

International Journal of Parallel Programming
A Vectorizing Compiler for Multimedia Extensions

International Journal of Parallel Programming
NaraView: An Interactive 3D Visualization System for Parallelization of Programs

International Journal of Parallel Programming
Reconfigurable Instruction Set Processors from a Hardware/Software Perspective

IEEE Transactions on Software Engineering
Array recovery and high-level transformations for DSP applications

ACM Transactions on Embedded Computing Systems (TECS)
A finite state machine based format model of software pipelined loops with conditions

Progress in computer research
A Correction Method for Parallel Loop Execution

ICCS '02 Proceedings of the International Conference on Computational Science-Part I
Combining Loop Fusion with Prefetching on Shared-memory Multiprocessors

ICPP '97 Proceedings of the international Conference on Parallel Processing
Time-Stamping Algorithms for Parallelization of Loops at Run-Time

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A BSP Approach to the Scheduling of Tightly-Nested Loops

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Formal Model of Software Pipelining Loops with Conditions

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Feedback Guided Scheduling of Nested Loops

PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
A Technique for Parallel Loop Execution

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Foundations of Cognitive Support: Toward Abstract Patterns of Usefulness

DSV-IS '02 Proceedings of the 9th International Workshop on Interactive Systems. Design, Specification, and Verification
Non-approximability of the Bulk Synchronous Task Scheduling Problem

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Compiler-Directed Reordering of Data by Cyclic Graph Coloring

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Analysis of Multithreaded Programs

SAS '01 Proceedings of the 8th International Symposium on Static Analysis
Reducing Cache Conflicts by a Parametrized Memory Mapping

ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
Advanced Scalarization of Array Syntax

CC '00 Proceedings of the 9th International Conference on Compiler Construction
Techniques for Effectively Exploiting a Zero Overhead Loop Buffer

CC '00 Proceedings of the 9th International Conference on Compiler Construction
Alias Analysis by Means of a Model Checker

CC '01 Proceedings of the 10th International Conference on Compiler Construction
A Case Study: Effects of WITH-Loop-Folding on the NAS Benchmark MG in SAC

IFL '98 Selected Papers from the 10th International Workshop on 10th International Workshop
Improving Cache Effectiveness through Array Data Layout Manipulation in SAC

IFL '00 Selected Papers from the 12th International Workshop on Implementation of Functional Languages
A Compilation Scheme for a Hierarchy of Array Types

IFL '02 Selected Papers from the 13th International Workshop on Implementation of Functional Languages
On increasing architecture awareness in program optimizations to bridge the gap between peak and sustained processor performance: matrix-multiply revisited

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Compiler optimization-space exploration

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Predicting the impact of optimizations for embedded systems

Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Memory disambiguation for general-purpose applications

CASCON '95 Proceedings of the 1995 conference of the Centre for Advanced Studies on Collaborative research
Template-based program restructuring - initial experience

CASCON '95 Proceedings of the 1995 conference of the Centre for Advanced Studies on Collaborative research
Reducing Address Bus Transitions for Low Power Memory Mapping

EDTC '96 Proceedings of the 1996 European conference on Design and Test
Using cache optimizing compiler for managing software cache on distributed shared memory system

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Address Code and Arithmetic Optimizations for Embedded Systems

ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Anticipatory Optimization in Domain Specific Translation

ICSR '98 Proceedings of the 5th International Conference on Software Reuse
Loop Alignment for Memory Accesses Optimization

Proceedings of the 12th international symposium on System synthesis
Using Graph Models in Retargetable Optimizing Compilers for Microprocessors with VLIW Architectures

Cybernetics and Systems Analysis
Memory Hierarchy Management for Iterative Graph Structures

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Value reuse optimization: reuse of evaluated math library function calls through compiler generated cache

ACM SIGPLAN Notices
Automatic code generation for a convection scheme

Proceedings of the 2003 ACM symposium on Applied computing
Symbolic transfer function-based approaches to certified compilation

Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
What can we gain by unfolding loops?

ACM SIGPLAN Notices
Single Assignment C: efficient support for high-level array operations in a functional setting

Journal of Functional Programming
Multi-objective co-exploration of source code transformations and design space architectures for low-power embedded systems

Proceedings of the 2004 ACM symposium on Applied computing
Data Reuse Analysis Technique for Software-Controlled Memory Hierarchies

Proceedings of the conference on Design, automation and test in Europe - Volume 1
Improving the adaptability of multi-mode systems via program steering

ISSTA '04 Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
An innovative low-power high-performance programmable signal processor for digital communications

IBM Journal of Research and Development
Compiler based exploration of DSP energy savings by SIMD operations

Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Improving Data Locality by Array Contraction

IEEE Transactions on Computers
An extended ANSI C for processors with a multimedia extension

International Journal of Parallel Programming
Control Flow Driven Splitting of Loop Nests at the Source Code Level

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Coordinated parallelizing compiler optimizations and high-level synthesis

ACM Transactions on Design Automation of Electronic Systems (TODAES)
A New Architecture for Transformation-Based Generators

IEEE Transactions on Software Engineering
Automatically partitioning packet processing applications for pipelined architectures

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Compiler

Encyclopedia of Computer Science
Analyzing data reuse for cache reconfiguration

ACM Transactions on Embedded Computing Systems (TECS)
Data dependence analysis techniques for increased accuracy and extracted parallelism

International Journal of Parallel Programming - Special issue II: The 17th annual international conference on supercomputing (ICS'03)
Compiler transformations for effectively exploiting a zero overhead loop buffer

Software—Practice & Experience
Behavioral-Level Performance and Power Exploration of Data-Intensive Applications Mapped on Programmable Processors

Journal of VLSI Signal Processing Systems
Compiler optimization of embedded applications for an adaptive SoC architecture

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
On minimizing materializations of array-valued temporaries

ACM Transactions on Programming Languages and Systems (TOPLAS)
A memory model for scientific algorithms on graphics processors

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A calculus for parallel computations over multidimensional dense arrays

Computer Languages, Systems and Structures
DRDU: A data reuse analysis technique for efficient scratch-pad memory management

ACM Transactions on Design Automation of Electronic Systems (TODAES)
SAC: off-the-shelf support for data-parallelism on multicores

Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Adaptive Service Composition in Flexible Processes

IEEE Transactions on Software Engineering
An operation stacking framework for large ensemble computations

Proceedings of the 21st annual international conference on Supercomputing
Incremental hierarchical memory size estimation for steering of loop transformations

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Dynamic graph-based software fingerprinting

ACM Transactions on Programming Languages and Systems (TOPLAS)
Structural function inlining technique for structurally recursive XML queries

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Influence of procedure cloning on WCET prediction

CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
SCCP/x: a compilation profile to support testing and verification of optimized code

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Software controlled memory layout reorganization for irregular array access patterns

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
The potential of trace-level parallelism in Java programs

Proceedings of the 5th international symposium on Principles and practice of programming in Java
Quasi-static scheduling for safe futures

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
WCET-driven, code-size critical procedure cloning

SCOPES '08 Proceedings of the 11th international workshop on Software & compilers for embedded systems
Finding free schedules for parameterized loops with affine dependences represented with a single dependence relation

AIC'05 Proceedings of the 5th WSEAS International Conference on Applied Informatics and Communications
A parallel dynamic compiler for CIL bytecode

ACM SIGPLAN Notices
Construction of speculative optimization algorithms

Programming and Computing Software
Compiler driven data layout optimization for regular/irregular array access patterns

Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
A Case Study of Some Issues in the Optimization of Fortran 90 Array Notation

Scientific Programming
Hiding Software Watermarks in Loop Structures

SAS '08 Proceedings of the 15th international symposium on Static Analysis
Using Padding to Optimize Locality in Scientific Applications

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Explicit Dependence Metadata in an Active Visual Effects Library

Languages and Compilers for Parallel Computing
Program transformation for numerical precision

Proceedings of the 2009 ACM SIGPLAN workshop on Partial evaluation and program manipulation
Optimizing parallelism for nested loops with iterational and instructional retiming

Journal of Embedded Computing - Selected papers of EUC 2005
A study of potential parallelism among traces in Java programs

Science of Computer Programming
Dynamic Look Ahead Compilation: A Technique to Hide JIT Compilation Latencies in Multicore Environment

CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
The effect of unrolling and inlining for Python bytecode optimizations

SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Source level merging of independent programs

Journal of Parallel and Distributed Computing
Placement optimization using data context collected during garbage collection

Proceedings of the 2009 international symposium on Memory management
Alchemist: A Transparent Dependence Distance Profiling Infrastructure

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
A Fast and Precise Static Loop Analysis Based on Abstract Interpretation, Program Slicing and Polytope Models

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Iterative development of parallel programs in the ParJava environment

Programming and Computing Software
Three fundamental dimensions of scientific workflow interoperability: Model of computation, language, and execution environment

Future Generation Computer Systems
Automating the generation of composed linear algebra kernels

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Adaptive scratch pad memory management for dynamic behavior of multimedia applications

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A directive-based MPI code generator for Linux PC clusters

The Journal of Supercomputing
Energy-Aware Design of Service-Based Applications

ICSOC-ServiceWave '09 Proceedings of the 7th International Joint Conference on Service-Oriented Computing
A highly flexible, parallel virtual machine: design and experience of ILDJIT

Software—Practice & Experience
Implementing the PGI Accelerator model

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
Axis control in SAC

IFL'02 Proceedings of the 14th international conference on Implementation of functional languages
Locality enhancement by array contraction

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Towards a source level compiler: source level modulo scheduling

Program analysis and compilation, theory and practice
Static reuse distances for locality-based optimizations in MATLAB

Proceedings of the 24th ACM International Conference on Supercomputing
Exploiting finite precision information to guide data-flow mapping

Proceedings of the 47th Design Automation Conference
Finding the best compromise in compiling compound loops to Verilog

Journal of Systems Architecture: the EUROMICRO Journal
Improving MPI communication via data type fission

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Engineering scalable, cache and space efficient tries for strings

The VLDB Journal — The International Journal on Very Large Data Bases
Computing the correct Increment of Induction Pointers with application to loop unrolling

Journal of Systems Architecture: the EUROMICRO Journal
Techniques and tools for dynamic optimization

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Parallelization of module network structure learning and performance tuning on SMP

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A code motion technique for accelerating general-purpose computation on the GPU

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Transparent runtime parallelization of the R scripting language

Journal of Parallel and Distributed Computing
Redesigning the string hash table, burst trie, and BST to exploit cache

Journal of Experimental Algorithmics (JEA)
A VLIW-based post compilation framework for multimedia embedded DSPs with hardware specific optimizations

MTPP'10 Proceedings of the Second Russia-Taiwan conference on Methods and tools of parallel programming multicomputers
Symbiotic expressions

IFL'09 Proceedings of the 21st international conference on Implementation and application of functional languages
DESOLA: An active linear algebra library using delayed evaluation and runtime code generation

Science of Computer Programming
Maintainable and reusable scientific software adaptation: democratizing scientific software adaptation

Proceedings of the tenth international conference on Aspect-oriented software development
Tackling cache-line stealing effects using run-time adaptation

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Codevelopment of multi-level instruction set architecture and hardware for an efficient matrix processor

Neural, Parallel & Scientific Computations
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators

Proceedings of the 38th annual international symposium on Computer architecture
An experimental approach to the performance penalty of the use of classes in Fortran 95

Advances in Engineering Software
HAWKEYE: effective discovery of dataflow impediments to parallelization

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Performance evaluation of highly efficient techniques for software implementation of LFSR

Computers and Electrical Engineering
Minimizing data size for efficient data reuse in grid-enabled medical applications

ISBMDA'06 Proceedings of the 7th international conference on Biological and Medical Data Analysis
Programmable data dependencies and placements

DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
Optimizing nested loops with iterational and instructional retiming

EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Compiler-optimized kernels: an efficient alternative to hand-coded inner kernels

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
Compiling high-level languages for vector architectures

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Removing impediments to loop fusion through code transformations

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Architecture- and OS-Independent binary-level dynamic test generation

ICICS'09 Proceedings of the 11th international conference on Information and Communications Security
Loop transformations in the ahead-of-time optimization of java bytecode

CC'06 Proceedings of the 15th international conference on Compiler Construction
An approach for semiautomatic locality optimizations using OpenMP

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Mat-core: a decoupled matrix core extension for general-purpose processors

Neural, Parallel & Scientific Computations
Integrating Memory Optimization with Mapping Algorithms for Multi-Processors System-on-Chip

ACM Transactions on Embedded Computing Systems (TECS)
Static detection of loop-invariant data structures

ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators

ACM Transactions on Computer Systems (TOCS)
Data-driven equivalence checking

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Vectorization past dependent branches through speculation

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
The benefits of using variable-length pipelined operations in high-level synthesis

ACM Transactions on Embedded Computing Systems (TECS)
HEAP: A Highly Efficient Adaptive multi-Processor framework

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimizations for uniprocessors reduce the number of instructions executed by the program using transformations based on the analysis of scalar quantities and data-flow techniques. In contrast, optimizations for high-performance superscalar, vector, and parallel processors maximize parallelism and memory locality with transformations that rely on tracking the properties of arrays using loop dependence analysis.This survey is a comprehensive overview of the important high-level program restructuring techniques for imperative languages, such as C and Fortran. Transformations for both sequential and various types of parallel architectures are covered in depth. We describe the purpose of each transformation, explain how to determine if it is legal, and give an example of its application.Programmers wishing to enhance the performance of their code can use this survey to improve their understanding of the optimizations that compilers can perform, or as a reference for techniques to be applied manually. Students can obtain an overview of optimizing compiler technology. Compiler writers can use this survey as a reference for most of the important optimizations developed to date, and as bibliographic reference for the details of each optimization. Readers are expected to be familiar with modern computer architecture and basic program compilation techniques.