Optimizing throughput/power trade-offs in hardware transactional memory using DVFS and intelligent scheduling

  • Authors:
  • Clay Hughes;Tao Li

  • Affiliations:
  • Florida State University, Panama City, FL, USA;University of Florida, Gainesville, FL, USA

  • Venue:
  • Proceedings of the international conference on Supercomputing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Power has emerged as a first-order design constraint in modern processors and has energized microarchitecture researchers to produce a growing number of power optimization proposals. Almost in tandem with the move toward more energy-efficient designs, architects have been increasing the number of processing elements (PEs) on a single chip and promoting the concept of running multithreaded workloads. Nevertheless, software is still lagging behind and is often unable to exploit these additional resources -- giving rise to transactional memory. Transactional memory is a promising programming abstraction that makes it easier for programmers to exploit the resources available in many- core processor systems by removing some of the complexity associated with traditional lock-based programming. This paper proposes new techniques to merge the power and transactional memory domains. An analysis of the per-core and chip-wide power consumption of hardware transactional memory systems (HTMs) pinpoints two areas ripe for power management policies: transactional stalls and aborts. The first proposed policy uses dynamic voltage and frequency scaling (DVFS) during transactional stall periods. By frequency scaling PEs based on their transactional state, DVFS can increase the throughput and energy efficiency of HTMs. The second method uses a transaction's conflict probability to reschedule transactions and clock gate aborted PEs to reduce overall contention and power consumption within the system. The proposed techniques are evaluated using three HTM configurations and are shown to reduce the energy delay squared product (ED2P) of the STAMP and SPLASH-2 benchmarks by an average of 18% when combined. Synthetic workloads are used to explore a wider range of program behaviors and the optimizations are shown to reduce the ED2P by an average of 29%. For a comparison, this work is shown reduce the ED2P by up to 30% relative to previous proposals for energy reduction in HTMs (e.g. transaction serialization).