Lightweight lock-free synchronization methods for multithreading

  • Authors:
  • Arun Kejariwal;Hideki Saito;Xinmin Tian;Milind Girkar;Wel Li;Utpal Banerjee;Alexandru Nicolau;Constantine D. Polychronopoulos

  • Affiliations:
  • University of California at Irvine, Irvine, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;University of California at Irvine, Irvine, CA;University of Illinois at Urbana-Champaign, Urbana, IL

  • Venue:
  • Proceedings of the 20th annual international conference on Supercomputing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Emergence of chip multiprocessors has created a need for exploitation of beyond DOALL-type thread-level parallelism (TLP). This calls for development of efficient thread synchronization techniques to exploit TLP in general parallel programs with dependences. For this, several thread synchronization techniques have been proposed in the past. However, these limit the exploitation of fine-grain TLP due to large run-time overhead. Furthermore, the existing approaches can potentially result in (i) deadlocks between the different threads and (ii) non-deterministic run-time execution behavior as these techniques are oblivious of the underlying memory model. In this paper, we propose lightweight lock-free thread synchronization methods to exploit TLP in general parallel programs with dependences. Each synchronization method intrinsically guarantees the following in a multithreaded program: (a) sequential consistency, (b) atomicity of writes to the shared synchronization construct and (c) absence of deadlocks. This reduces the programming effort considerably, thereby easing the development of software for multithreaded systems. For each method we formally prove that there cannot occur a deadlock between the different threads. This obviates the cumbersome and time-consuming process of detecting and eliminating deadlocks from the programmer. Experiments show that our synchronization methods incur a minimal overhead of 7.16% on an average. Further, we achieve performance speedups upto 3.39x on kernels extracted from the industry standard SPEC OMPM 2001 benchmarks, on a dedicated Intel® Xeon® 2.78 GHz 4-way multiprocessor.