Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization primitives

Authors:
Martin C. Rinard
Affiliations:
Massachusetts Institute of Technology, Cambridge
Venue:
ACM Transactions on Computer Systems (TOCS)
Year:
1999

Citing 23
Cited 17

Threads and input/output in the synthesis kernal

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Lazy task creation: a technique for increasing the granularity of parallel programs

LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
A methodology for implementing highly concurrent data objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Performance studies of Id on the Monsoon dataflow system

Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SPARC architecture manual (version 9)

The SPARC architecture manual (version 9)
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Memory consistency models for shared-memory multiprocessors

Memory consistency models for shared-memory multiprocessors
The synergy between non-blocking synchronization and operating system structure

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Simple, fast, and practical non-blocking and blocking concurrent queue algorithms

PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Synchronization transformations for parallel computing

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Commutativity analysis: a new analysis technique for parallelizing compilers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Lock coarsening: eliminating lock overhead in automatically parallelized object-based programs

Journal of Parallel and Distributed Computing
Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback

ACM Transactions on Computer Systems (TOCS)
Symbolic execution and program testing

Communications of the ACM
Multiple Reservations and the Oklahoma Update

IEEE Parallel & Distributed Technology: Systems & Technology
Shared Memory Consistency Models: A Tutorial

Computer
M-Structures: Extending a Parallel, Non-strict, Functional Language with State

Proceedings of the 5th ACM Conference on Functional Programming Languages and Computer Architecture
Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction

DCAS-based concurrent deques

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Room synchronizations

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Pointer analysis for structured parallel programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Speculative synchronization: applying thread-level speculation to explicitly parallel applications

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Eliminating synchronization bottlenecks using adaptive replication

ACM Transactions on Programming Languages and Systems (TOPLAS)
Atomic Instructions in Java

ECOOP '02 Proceedings of the 16th European Conference on Object-Oriented Programming
Analysis of Multithreaded Programs

SAS '01 Proceedings of the 8th International Symposium on Static Analysis
Even Better DCAS-Based Concurrent Deques

DISC '00 Proceedings of the 14th International Conference on Distributed Computing
Stabilizers: a modular checkpointing abstraction for concurrent functional programs

Proceedings of the eleventh ACM SIGPLAN international conference on Functional programming
Modular Checkpointing for Atomicity

Electronic Notes in Theoretical Computer Science (ENTCS)
Using early phase termination to eliminate load imbalances at barrier synchronization points

Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Synchronization coherence: A transparent hardware mechanism for cache coherence and fine-grained synchronization

Journal of Parallel and Distributed Computing
A (condensed) parametric study of optimistic computation in wide-area, distributed environments

Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities
Synchronization optimizations for efficient execution on multi-cores

Proceedings of the 23rd international conference on Supercomputing
Adaptive locks: Combining transactions and locks for efficient concurrency

Journal of Parallel and Distributed Computing
Lightweight checkpointing for concurrent ml

Journal of Functional Programming
Allocating memory in a lock-free manner

ESA'05 Proceedings of the 13th annual European conference on Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article presents our experience using optimistic synchronization to implement fine-grain atomic operations in the context of a parallelizing compiler for irregular, object-based computations. Our experience shows that the synchronization requirements of these programs differ significantly from those of traditional parallel computations, which use loop nests to access dense matrices using affine access functions. In addition to coarse-grain barrier synchronization, our irregular computations require synchronization primitives that support efficient fine-grain atomic operations. The standard implementation mechanism for atomic operations uses mutual exclusion locks. But the overhead of acquiring and releasing locks can reduce the performance. Locks can also consume significant amounts of memory. Optimistic synchronization primitives such as loud-linked/store conditional are an attractive alternative. They require no additional memory and eliminate the use of heavyweight blocking synchronization constructs. We evaluate the effectiveness of optimistic synchronization by comparing experimental results from two versions of a parallelizing compiler for irregular, object-based computations. One version generates code that uses mutual exclusion locks to make operations execute atomically. The other version generates code that uses mutual exclusion locks to make operations execute atomically. The other version uses optimistic synchronization. We used this compiler to automatically parallelize three irregular, object-based benchmark applications of interest to the scientific and engineering computation community. The presented experimental results indicate that the use of optimistic synchronization in this context can significantly reduce the memory consumption and improve the overall performance.