Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
Compiler algorithms for event variable synchronization
ICS '91 Proceedings of the 5th international conference on Supercomputing
Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
A compiler-assisted approach to SPMD execution
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Data-parallel programming on MIMD computers
Data-parallel programming on MIMD computers
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
Relaxing SIMD control flow constraints using loop transformations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Communication optimization and code generation for distributed memory machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Evaluation of release consistent software distributed shared memory on emerging network technology
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler techniques for data synchronization in nested parallel loops
ICS '90 Proceedings of the 4th international conference on Supercomputing
The impact of synchronization and granularity on parallel systems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
An Iteration Partition Approach for Cache or Local Memory Thrashing on Parallel Processing
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Synchronization Issues in Data-Parallel Languages
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Polaris: Improving the Effectiveness of Parallelizing Compilers
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Static Analysis of Barrier Synchronization in Explicitly Parallel Programs
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Automatic Data Layout Using 0-1 Integer Programming
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Automatic synchronisation elimination in synchronous FORALLs
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Unified compilation techniques for shared and distributed address space machines
ICS '95 Proceedings of the 9th international conference on Supercomputing
Commutativity analysis: a new analysis framework for parallelizing compilers
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Global communication analysis and optimization
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Static analysis to reduce synchronization costs in data-parallel programs
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Performance debugging shared memory parallel programs using run-time dependence analysis
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A graph based approach to barrier synchronisation minimisation
ICS '97 Proceedings of the 11th international conference on Supercomputing
Synchronization transformations for parallel computing
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Commutativity analysis: a new analysis technique for parallelizing compilers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Problem and machine sensitive communication optimization
ICS '98 Proceedings of the 12th international conference on Supercomputing
A Compiler Optimization Algorithm for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
An affine partitioning algorithm to maximize parallelism and minimize communication
ICS '99 Proceedings of the 13th international conference on Supercomputing
Effective synchronization removal for Java
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Minimizing Data and Synchronization Costs in One-Way Communication
IEEE Transactions on Parallel and Distributed Systems
A synthesis of memory mechanisms for distributed architectures
ICS '01 Proceedings of the 15th international conference on Supercomputing
A framework for global communication analysis of optimizations
Compiler optimizations for scalable parallel systems
Compile Time Barrier Synchronization Minimization
IEEE Transactions on Parallel and Distributed Systems
Pointer analysis for structured parallel programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Eliminating Barrier Synchronization for Compiler-Parallelized Codes on Software DSMs
International Journal of Parallel Programming
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Towards OpenMP Execution on Software Distributed Shared Memory Systems
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Analysis of Multithreaded Programs
SAS '01 Proceedings of the 8th International Symposium on Static Analysis
Aggressive communication optimizations for clusters of workstations
Cluster computing
Compiler-generated communication for pipelined FPGA applications
Proceedings of the 40th annual Design Automation Conference
Adaptive Parallel System as the High Performance Parallel Architecture
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Compile-time Synchronization Optimizations for Software DSMs
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Exploiting Processor Workload Heterogeneity for Reducing Energy Consumption in Chip Multiprocessors
Proceedings of the conference on Design, automation and test in Europe - Volume 2
Optimizing OpenMP programs on software distributed shared memory systems
International Journal of Parallel Programming - Special issue: OpenMP: Experiences and implementations
Evaluating heuristics in automatically mapping multi-loop applications to FPGAs
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
A linear-time algorithm for optimal barrier placement
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Shared memory multiprocessor support for functional array processing in SAC
Journal of Functional Programming
Interprocedural parallelization analysis in SUIF
ACM Transactions on Programming Languages and Systems (TOPLAS)
The Journal of Supercomputing
Barrier matching for programs with textually unaligned barriers
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler optimization techniques for OpenMP programs
Scientific Programming
Software-directed combined cpu/link voltage scaling fornoc-based cmps
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Efficient, portable implementation of asynchronous multi-place programs
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Synchronization optimizations for efficient execution on multi-cores
Proceedings of the 23rd international conference on Supercomputing
Chunking parallel loops in the presence of synchronization
Proceedings of the 23rd international conference on Supercomputing
Compiling for a hybrid programming model using the LMAD representation
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Reducing task creation and termination overhead in explicitly parallel programs
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Evaluation of a virtual shared memory machine by the compilation of data parallel loops
EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
Landing stencil code on Godson-T
Journal of Computer Science and Technology
Probabilistic analysis of time reduction by eliminating barriers in parallel programmes
International Journal of Communication Networks and Distributed Systems
Improving energy efficiency of multi-threaded applications using heterogeneous CMOS-TFET multicores
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Unifying barrier and point-to-point synchronization in OpenMP with phasers
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Array replication to increase parallelism in applications mapped to configurable architectures
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
DVM: towards a datacenter-scale virtual machine
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Enforcing textual alignment of collectives using dynamic checks
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Barrier elimination based on access dependency analysis for OpenMP
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
A Transformation Framework for Optimizing Task-Parallel Programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Hi-index | 0.00 |
This paper presents novel compiler optimizations for reducing synchronization overhead in compiler-parallelized scientific codes. A hybrid programming model is employed to combine the flexibility of the fork-join model with the precision and power of the single-program, multiple data (SPMD) model. By exploiting compile-time computation partitions, communication analysis can eliminate barrier synchronization or replace it with less expensive forms of synchronization. We show computation partitions and data communication can be represented as systems of symbolic linear inequalities for high flexibility and precision. These optimizations has been implemented in the Stanford SUIF compiler. We extensively evaluate their performance using standard benchmark suites. Experimental results show barrier synchronization is reduced 29% on average and by several orders of magnitude for certain programs.