Lightweight lock-free synchronization methods for multithreading

Authors:
Arun Kejariwal;Hideki Saito;Xinmin Tian;Milind Girkar;Wel Li;Utpal Banerjee;Alexandru Nicolau;Constantine D. Polychronopoulos
Affiliations:
University of California at Irvine, Irvine, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;University of California at Irvine, Irvine, CA;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
Proceedings of the 20th annual international conference on Supercomputing
Year:
2006

Citing 32
Cited 4

Compiler algorithms for synchronization

IEEE Transactions on Computers
Efficient and correct execution of parallel programs that share memory

ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient synchronization primitives for large-scale cache-coherent multiprocessors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Synchronization Algorithms for Shared-Memory Multiprocessors

Computer
Introduction to algorithms

Introduction to algorithms
A methodology for implementing highly concurrent data structures

PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Compiler algorithms for event variable synchronization

ICS '91 Proceedings of the 5th international conference on Supercomputing
What are race conditions?: Some issues and formalizations

ACM Letters on Programming Languages and Systems (LOPLAS)
Optimal strategies for spinning and blocking

Journal of Parallel and Distributed Computing
The hierarchical task graph as a universal intermediate representation

International Journal of Parallel Programming
Optimizing parallel programs with explicit synchronization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Synchronization and communication in the T3E multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Barrier inference

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Empirical studies of competitve spinning for a shared-memory multiprocessor

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Weak ordering—a new definition

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Memory consistency and event ordering in scalable shared-memory multiprocessors

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The impact of synchronization and granularity on parallel systems

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Proving Liveness Properties of Concurrent Programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Hiding Relaxed Memory Consistency with a Compiler

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Operating System Concepts

Operating System Concepts
Structure of Computers and Computations

Structure of Computers and Computations
Types for atomicity

Proceedings of the 2003 ACM SIGPLAN international workshop on Types in languages design and implementation
Multiprocessors Should Support Simple Memory-Consistency Models

Computer
Parallel Program Graphs and their Classification

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Automatic fence insertion for shared memory multiprocessing

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Fast Synchronization on Scalable Cache-Coherent Multiprocessors using Hybrid Primitives

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Mechanisms for efficient shared-memory, lock-based synchronization

Mechanisms for efficient shared-memory, lock-based synchronization
Thin locks: featherweight Synchronization for Java

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Static analysis of atomicity for programs with non-blocking synchronization

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Practical Compiler Techniques on Efficient Multithreaded Code Generation for OpenMP Programs

The Computer Journal
Speculative Synchronization: Programmability and Performance for Parallel Codes

IEEE Micro
Concurrency analysis for parallel programs with textually aligned barriers

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing

Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures

Proceedings of the 34th annual international symposium on Computer architecture
Techniques for efficient placement of synchronization primitives

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Synchronization optimizations for efficient execution on multi-cores

Proceedings of the 23rd international conference on Supercomputing
The elephant and the mice: the role of non-strict fine-grain synchronization for modern many-core architectures

Proceedings of the international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Emergence of chip multiprocessors has created a need for exploitation of beyond DOALL-type thread-level parallelism (TLP). This calls for development of efficient thread synchronization techniques to exploit TLP in general parallel programs with dependences. For this, several thread synchronization techniques have been proposed in the past. However, these limit the exploitation of fine-grain TLP due to large run-time overhead. Furthermore, the existing approaches can potentially result in (i) deadlocks between the different threads and (ii) non-deterministic run-time execution behavior as these techniques are oblivious of the underlying memory model. In this paper, we propose lightweight lock-free thread synchronization methods to exploit TLP in general parallel programs with dependences. Each synchronization method intrinsically guarantees the following in a multithreaded program: (a) sequential consistency, (b) atomicity of writes to the shared synchronization construct and (c) absence of deadlocks. This reduces the programming effort considerably, thereby easing the development of software for multithreaded systems. For each method we formally prove that there cannot occur a deadlock between the different threads. This obviates the cumbersome and time-consuming process of detecting and eliminating deadlocks from the programmer. Experiments show that our synchronization methods incur a minimal overhead of 7.16% on an average. Further, we achieve performance speedups upto 3.39x on kernels extracted from the industry standard SPEC OMPM 2001 benchmarks, on a dedicated Intel® Xeon® 2.78 GHz 4-way multiprocessor.