Optimizing transactions for captured memory

  • Authors:
  • Aleksandar Dragojevic;Yang Ni;Ali-Reza Adl-Tabatabai

  • Affiliations:
  • EPFL, Lausanne, Switzerland;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA

  • Venue:
  • Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
  • Year:
  • 2009

Quantified Score

Hi-index 0.02

Visualization

Abstract

In this paper, we identify transaction-local memory as a major source of overhead from compiler instrumentation in software transactional memory (STM). Transaction-local memory is memory allocated inside a transaction, which cannot escape (i.e., is captured by) the allocating transaction. Accesses to such memory do not require calls to STM memory access functions (also called STM barriers). A compiler unaware of that, however, may translate simple memory load/store operations accessing such memory into more expensive STM barriers. This presents us opportunities to improve STM performance. Our measurements with the STAMP benchmark suite (version 0.9.9) revealed that as many as 60% of the STM barriers generated by our baseline compiler can be accesses to captured memory, which include 90% of the write barriers and 45% of the read barriers. We propose runtime and compiler optimizations to elide STM barriers to captured memory. Similar techniques can also be used to elide barriers for accesses to thread-local and read-only data. We implemented those optimizations in the Intel C++ STM compiler. Our experiments with the STAMP benchmark suite on a Intel Dunnington system (with 24 cores in a 4-node SMP system) showed that upto 18% performance improvement could be achieved at 16 threads.