Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization primitives

Authors:
Martin Rinard
Affiliations:
Department of Computer Science, University of California, Santa Barbara, Santa Barbara, CA
Venue:
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
1997

Citing 18
Cited 1

Threads and input/output in the synthesis kernal

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Lazy task creation: a technique for increasing the granularity of parallel programs

LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
M-structures: extending a parallel, non-strict, functional language with state

Proceedings of the 5th ACM conference on Functional programming languages and computer architecture
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
A methodology for implementing highly concurrent data objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Performance studies of Id on the Monsoon dataflow system

Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Commutativity analysis: a new analysis framework for parallelizing compilers

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Memory consistency models for shared-memory multiprocessors

Memory consistency models for shared-memory multiprocessors
Simple, fast, and practical non-blocking and blocking concurrent queue algorithms

PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Synchronization transformations for parallel computing

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Empirical studies of competitve spinning for a shared-memory multiprocessor

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Symbolic execution and program testing

Communications of the ACM
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction

Eliminating synchronization bottlenecks in object-based programs using adaptive replication

ICS '99 Proceedings of the 13th international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As shared-memory multiprocessors become the dominant commodity source of computation, parallelizing compilers must support mainstream computations that manipulate irregular, pointer-based data structures such as lists, trees and graphs, Our experience with a parallelizing compiler for this class of applications shows that their synchronization requirements differ significantly from those of traditional parallel computations. Instead of coarse-grain barrier synchronization, irregular computations require synchronization primitives that support efficient fine-grain atomic operations.The standard implementation mechanism for atomic operations uses mutual exclusion locks. But the overhead of acquiring and releasing locks can reduce the performance. Locks can also consume significant amourtts of memory. Optimistic synchronization primitives such as load linked/stor conditional are an attractive alternative. They require no additional memory and eliminate the use of heavyweight blocking synchronization constructs.This paper presents our experience using optimistic synchronization to implement fine-grain atomic operations in the context of a parallelizing compiler for irregular object-based programs. We have implemented two versions of the compiler. One version generates code that uses mutual exclusion locks to make operations execute atomically. The other version uses optimistic synchronization. This paper presents the first published algorithm that enables compilers to automatically generate optimistically synchronized parallel code. The presented experimental results indicate that optimistic synchronization is clearly the superior choice for our set of applications. Our results show that it can significantly reduce the memory consumption and improve the overall performance.