Eliminating synchronization bottlenecks using adaptive replication

Authors:
Martin C. Rinard;Pedro C. Diniz
Affiliations:
Massachusetts Institute of Technology, Cambridge, MA;University of Southern California, Marina, CA
Venue:
ACM Transactions on Programming Languages and Systems (TOPLAS)
Year:
2003

Citing 38
Cited 1

Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
Lazy task creation: a technique for increasing the granularity of parallel programs

LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
Analysis of pointers and structures

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Program optimization and parallelization using idioms

POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
The design and analysis of DASH: a scalable directory-based multiprocessor

The design and analysis of DASH: a scalable directory-based multiprocessor
A concurrent copying garbage collector for languages that distinguish (im)mutable data

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
A concurrent, generational garbage collector for a multithreaded implementation of ML

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Parallel hierarchical N-body methods and their implications for multiprocessors

Parallel hierarchical N-body methods and their implications for multiprocessors
Parallelizing complex scans and reductions

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Context-sensitive interprocedural points-to analysis in the presence of function pointers

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Concurrent replicating garbage collection

LFP '94 Proceedings of the 1994 ACM conference on LISP and functional programming
Efficient context-sensitive pointer analysis for C programs

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Flattening and parallelizing irregular, recurrent loop nests

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
TreadMarks: Shared Memory Computing on Networks of Workstations

Computer
Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic inline allocation of objects

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Commutativity analysis: a new analysis technique for parallelizing compilers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Solving shape-analysis problems in languages with destructive updating

ACM Transactions on Programming Languages and Systems (TOPLAS)
Lock coarsening: eliminating lock overhead in automatically parallelized object-based programs

Journal of Parallel and Distributed Computing
The Java programming language (2nd ed.)

The Java programming language (2nd ed.)
The design, implementation, and evaluation of Jade

ACM Transactions on Programming Languages and Systems (TOPLAS)
Pointer analysis for multithreaded programs

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Eliminating synchronization bottlenecks in object-based programs using adaptive replication

ICS '99 Proceedings of the 13th international conference on Supercomputing
Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback

ACM Transactions on Computer Systems (TOCS)
Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization primitives

ACM Transactions on Computer Systems (TOCS)
Experience with processes and monitors in Mesa

Communications of the ACM
Symbolic execution and program testing

Communications of the ACM
Monitors: an operating system structuring concept

Communications of the ACM
Structured multiprogramming

Communications of the ACM
Transaction Processing: Concepts and Techniques

Transaction Processing: Concepts and Techniques
Decentralized Optimal Power Pricing: The Development of a Parallel Program

IEEE Parallel & Distributed Technology: Systems & Technology
Priority Inheritance Protocols: An Approach to Real-Time Synchronization

IEEE Transactions on Computers
Recognizing and Parallelizing Bounded Recurrences

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Shared virtual memory on loosely coupled multiprocessors

Shared virtual memory on loosely coupled multiprocessors

Self-replicating objects for multicore platforms

ECOOP'10 Proceedings of the 24th European conference on Object-oriented programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article presents a new technique, adaptive replication, for automatically eliminating synchronization bottlenecks in multithreaded programs that perform atomic operations on objects. Synchronization bottlenecks occur when multiple threads attempt to concurrently update the same object. It is often possible to eliminate synchronization bottlenecks by replicating objects. Each thread can then update its own local replica without synchronization and without interacting with other threads. When the computation needs to access the original object, it combines the replicas to produce the correct values in the original object. One potential problem is that eagerly replicating all objects may lead to performance degradation and excessive memory consumption.Adaptive replication eliminates unnecessary replication by dynamically detecting contention at each object to find and replicate only those objects that would otherwise cause synchronization bottlenecks. We have implemented adaptive replication in the context of a parallelizing compiler for a subset of C++. Given an unannotated sequential program written in C++, the compiler automatically extracts the concurrency, determines when it is legal to apply adaptive replication, and generates parallel code that uses adaptive replication to efficiently eliminate synchronization bottlenecks.In addition to automatic parallelization and adaptive replication, our compiler also implements a lock coarsening transformation that increases the granularity at which the computation locks objects. The advantage is a reduction in the frequency with which the computation acquires and releases locks; the potential disadvantage is the introduction of new synchronization bottlenecks caused by increases in the sizes of the critical sections. Because the adaptive replication transformation takes place at lock acquisition sites, there is a synergistic interaction between lock coarsening and adaptive replication. Lock coarsening drives down the overhead of using adaptive replication, and adaptive replication eliminates synchronization bottlenecks associated with the overaggressive use of lock coarsening.Our experimental results show that, for our set of benchmark programs, the combination of lock coarsening and adaptive replication can eliminate synchronization bottlenecks and significantly reduce the synchronization and replication overhead as compared to versions that use none or only one of the transformations.