Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback

Authors:
Pedro C. Diniz;Martin C. Rinard
Affiliations:
Univ. of Southern California, Marina del Rey, CA;Massachusetts Institute of Technology, Cambridge
Venue:
ACM Transactions on Computer Systems (TOCS)
Year:
1999

Citing 42
Cited 5

Global register allocation at link time

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
Customization: optimizing compiler technology for SELF, a dynamically-typed object-oriented programming language

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
CCG: a prototype coagulating code generator

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Profile-guided automatic inline expansion for C programs

Software—Practice & Experience
Adjustable block size coherent caches

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The design and analysis of DASH: a scalable directory-based multiprocessor

The design and analysis of DASH: a scalable directory-based multiprocessor
Heterogeneous parallel programming in Jade

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Communication optimization and code generation for distributed memory machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Improving the performance of runtime parallelization

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Adaptive cache coherency for detecting migratory shared data

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Optimizing dynamically-dispatched calls with run-time type feedback

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Profile-assisted instruction scheduling

International Journal of Parallel Programming
Reactive synchronization algorithms for multiprocessors

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Obtaining sequential efficiency for concurrent object-oriented languages

POPL '95 Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Simple and effective link-time optimization of Modula-3 programs

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
High-level optimization via automated statistical modeling

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Profile-guided receiver class prediction

Proceedings of the tenth annual conference on Object-oriented programming systems, languages, and applications
Automatic data layout for high performance Fortran

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Optimizing ML with run-time code generation

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Fast, effective dynamic compilation

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
VCODE: a retargetable, extensible, very fast dynamic code generation system

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Adapting to network and client variability via on-demand dynamic distillation

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
A cost-comparison approach for adaptive distributed shared memory

ICS '96 Proceedings of the 10th international conference on Supercomputing
Synchronization transformations for parallel computing

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Reactive NUMA: a design for unifying S-COMA and CC-NUMA

Proceedings of the 24th annual international symposium on Computer architecture
Continuous profiling: where have all the cycles gone?

ACM Transactions on Computer Systems (TOCS)
System support for automatic profiling and optimization

Proceedings of the sixteenth ACM symposium on Operating systems principles
Commutativity analysis: a new analysis technique for parallelizing compilers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Predicated array data-flow analysis for run-time parallelization

ICS '98 Proceedings of the 12th international conference on Supercomputing
Lock coarsening: eliminating lock overhead in automatically parallelized object-based programs

Journal of Parallel and Distributed Computing
Experience with the SETL Optimizer

ACM Transactions on Programming Languages and Systems (TOPLAS)
IEEE Standard for Scalable Coherent Interface, Science: IEEE Std. 1596-1992

IEEE Standard for Scalable Coherent Interface, Science: IEEE Std. 1596-1992
Application-specific protocols for user-level shared memory

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Beyond the Black Box: Open Implementation

IEEE Software
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Improving the Effectiveness of Software Prefetching with Adaptive Execution

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques

Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization primitives

ACM Transactions on Computer Systems (TOCS)
A case for user-level dynamic page migration

Proceedings of the 14th international conference on Supercomputing
The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors

International Journal of Parallel Programming
Eliminating synchronization bottlenecks using adaptive replication

ACM Transactions on Programming Languages and Systems (TOPLAS)
Smartlocks: lock acquisition scheduling for self-aware synchronization

Proceedings of the 7th international conference on Autonomic computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article presents dynamic feedback, a technique that enables computations to adapt dynamically to different execution environments. A compiler that uses dynamic feedback produces several different versions of the same source code; each version uses a different optimization policy. The generated code alternately performs sampling phases and production phases. Each sampling phase measures the overhead of each version in the current environment. Each production phase uses the version with the least overhead in the previous sampling phase. The computation periodically resamples to adjust dynamically to changes in the environment. We have implemented dynamic feedback in the context of a parallelizing compiler for object-based programs. The generated code uses dynamic feedback to automatically choose the best synchronization optimization policy. Our experimental results show that the synchronization optimization policy has a significant impact on the overall performance of the computation, that the best policy varies from program to program, that the compiler is unable to statically choose the best policy, and that dynamic feedback enables the generted code to exhibit performance that is comparable to that of code that has been manually tuned to use the best policy. We have also performed a theoretical analysis which provides, under certain assumptions, a guaranteed optimality bound for dynamic feedback relative to a hypothetical (and unrealizable) optimal algorithm that uses the best policy at every point during the execution.