Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory

  • Authors:
  • Mojtaba Mehrara;Jeff Hao;Po-Chun Hsu;Scott Mahlke

  • Affiliations:
  • University of Michigan, Ann Arbor, MI, USA;University of Michigan, Ann Arbor, MI, USA;University of Michigan, Ann Arbor, MI, USA;University of Michigan, Ann Arbor, MI, USA

  • Venue:
  • Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multicore designs have emerged as the mainstream design paradigm for the microprocessor industry. Unfortunately, providing multiple cores does not directly translate into performance for most applications. The industry has already fallen short of the decades-old performance trend of doubling performance every 18 months. An attractive approach for exploiting multiple cores is to rely on tools, both compilers and runtime optimizers, to automatically extract threads from sequential applications. However, despite decades of research on automatic parallelization, most techniques are only effective in the scientific and data parallel domains where array dominated codes can be precisely analyzed by the compiler. Thread-level speculation offers the opportunity to expand parallelization to general-purpose programs, but at the cost of expensive hardware support. In this paper, we focus on providing low-overhead software support for exploiting speculative parallelism. We propose STMlite, a light-weight software transactional memory model that is customized to facilitate profile-guided automatic loop parallelization. STMlite eliminates a considerable amount of checking and locking overhead in conventional software transactional memory models by decoupling the commit phase from main transaction execution. Further, strong atomicity requirements for generic transactional memories are unnecessary within a stylized automatic parallelization framework. STMlite enables sequential applications to extract meaningful performance gains on commodity multicore hardware.