Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design

Authors:
C. D. Polychronopoulos
Affiliations:
-
Venue:
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Year:
1988

Citing 14
Cited 31

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Executing a program on the MIT tagged-token dataflow architecture

Volume II: Parallel Languages on PARLE: Parallel Architectures and Languages Europe
Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
Advanced loop optimizations for parallel computers

Proceedings of the 1st International Conference on Supercomputing
On the combination of hardware and software concurrency extraction methods

MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
Dependence graphs and compiler optimizations

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Structure of Computers and Computations

Structure of Computers and Computations
Speedup of ordinary programs

Speedup of ordinary programs
Optimizing supercompilers for supercomputers

Optimizing supercompilers for supercomputers
Parallelism, memory anti-aliasing and correctness for trace scheduling compilers (disambiguation, flow-analysis, compaction)

Parallelism, memory anti-aliasing and correctness for trace scheduling compilers (disambiguation, flow-analysis, compaction)
Hardware extraction of low-level concurrency from sequential instruction streams (parallelism, implementation, architecture, dependencies, semantics)

Hardware extraction of low-level concurrency from sequential instruction streams (parallelism, implementation, architecture, dependencies, semantics)
On program restructuring, scheduling, and communication for parallel processor systems

On program restructuring, scheduling, and communication for parallel processor systems

The fuzzy barrier: a mechanism for high speed synchronization of processors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Translation lookaside buffer consistency: a software approach

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Loop optimization in register-transfer scheduling for DSP-systems

DAC '89 Proceedings of the 26th ACM/IEEE Design Automation Conference
Compiler-Assisted Synthesis of Algorithm-Based Checking in Multiprocessors

IEEE Transactions on Computers
Compiling programs for a linear systolic array

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Vectorization and parallelization of irregular problems via graph coloring

ICS '91 Proceedings of the 5th international conference on Supercomputing
An effective synchronization network for hot-spot accesses

ACM Transactions on Computer Systems (TOCS)
The privatizing DOALL test: a run-time technique for DOALL loop identification and array privatization

ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler techniques for maximizing fine-grain and coarse-grain parallelism in loops with uniform dependences

ICS '94 Proceedings of the 8th international conference on Supercomputing
Run-time methods for parallelizing partially parallel loops

ICS '95 Proceedings of the 9th international conference on Supercomputing
A specification invariant technique for operation cost minimisation in flow-graphs

ISSS '94 Proceedings of the 7th international symposium on High-level synthesis
System-Level Data-Flow Transformation Exploration andPower-Area Trade-offs Demonstrated on Video Codecs

Journal of VLSI Signal Processing Systems - Special issue on systematic trade-off analysis in signal processing systems design
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
The impact of dependence cycle statement ordering on the performance of parallel loops (abstract and references only)

CSC '91 Proceedings of the 19th annual conference on Computer Science
A Specification Refinement Methodology for Power Efficient Partitioning of Data-Dominated Algorithms Within Performance Constraints

Journal of VLSI Signal Processing Systems
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Matrix Multiplication on Heterogeneous Platforms

IEEE Transactions on Parallel and Distributed Systems
Automatic data and computation decomposition on distributed memory parallel computers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Code Transformations for Data Transfer and Storage Exploration Preprocessing in Multimedia Processors

IEEE Design & Test
Partitioning and Labeling of Loops by Unimodular Transformations

IEEE Transactions on Parallel and Distributed Systems
Synchronization and Communication Costs of Loop Partitioning on Shared-Memory Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
On Loop Transformations for Generalized Cycle Shrinking

IEEE Transactions on Parallel and Distributed Systems
Constructive Methods for Scheduling Uniform Loop Nests

IEEE Transactions on Parallel and Distributed Systems
An Efficient Run-Time Scheme for Exploiting Parallelism on Multiprocessor Systems

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
A Specification Invariant Technique for Regularity Improvement between Flow-Graph Clusters

EDTC '96 Proceedings of the 1996 European conference on Design and Test
A Loop Transformation for Maximizing Parallelism from Single Loops with Nonuniform Dependencies

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Extracting Parallelism in Nested Loops

COMPSAC '96 Proceedings of the 20th Conference on Computer Software and Applications
Trade-offs in loop transformations

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Optimal loop parallelization for maximizing iteration-level parallelism

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Predecessor/successor approach for high-performance run-time wavefront scheduling

Information Sciences: an International Journal
Loop Transforming for Reducing Data Alignment on Multi-Core SIMD Processors

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

By examining the structure and characteristics of parallel programs the author isolates potential overhead sources. The first compiler optimization considered is cycle shrinking which can be used to parallelize certain types of serial loops. A run-time dependence analysis is then considered along with how it can be performed through compiler-inserted bookkeeping and control statements. Loops with unstructured parallelism, that cannot benefit from existing optimizations, can be parallelized through run-time dependence checking. Finally, barrier synchronization is discussed as one of the most serious sources of run-time overhead in parallel programs. To reduce the impact of barriers, the author briefly discusses the implementation of distributed barriers through the use of a set of shared registers.