Optimizing throughput/power trade-offs in hardware transactional memory using DVFS and intelligent scheduling

Authors:
Clay Hughes;Tao Li
Affiliations:
Florida State University, Panama City, FL, USA;University of Florida, Gainesville, FL, USA
Venue:
Proceedings of the international conference on Supercomputing
Year:
2011

Citing 23
Cited 0

Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Voltage scheduling problem for dynamically variable voltage processors

ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Temperature-aware microarchitecture

Proceedings of the 30th annual international symposium on Computer architecture
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
Performance, Energy, and Thermal Considerations for SMT and CMP Architectures

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Improved automatic testcase synthesis for performance model validation

Proceedings of the 19th annual international conference on Supercomputing
An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Improving fairness, throughput and energy-efficiency on a chip multiprocessor through DVFS

ACM SIGARCH Computer Architecture News
BulkSC: bulk enforcement of sequential consistency

Proceedings of the 34th annual international symposium on Computer architecture
LogTM-SE: Decoupling Hardware Transactional Memory from Caches

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Implementing Signatures for Transactional Memory

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive transaction scheduling for transactional memory systems

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Using Analytical Models to Efficiently Explore Hardware Transactional Memory and Multi-Core Co-Design

SBAC-PAD '08 Proceedings of the 2008 20th International Symposium on Computer Architecture and High Performance Computing
Notary: Hardware techniques to enhance signatures

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Thread motion: fine-grained power management for multi-core systems

Proceedings of the 36th annual international symposium on Computer architecture
Clock gate on abort: Towards energy-efficient hardware Transactional Memory

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
On the (dis)similarity of transactional memory workloads

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Energy and throughput efficient transactional memory for embedded multicore systems

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Power has emerged as a first-order design constraint in modern processors and has energized microarchitecture researchers to produce a growing number of power optimization proposals. Almost in tandem with the move toward more energy-efficient designs, architects have been increasing the number of processing elements (PEs) on a single chip and promoting the concept of running multithreaded workloads. Nevertheless, software is still lagging behind and is often unable to exploit these additional resources -- giving rise to transactional memory. Transactional memory is a promising programming abstraction that makes it easier for programmers to exploit the resources available in many- core processor systems by removing some of the complexity associated with traditional lock-based programming. This paper proposes new techniques to merge the power and transactional memory domains. An analysis of the per-core and chip-wide power consumption of hardware transactional memory systems (HTMs) pinpoints two areas ripe for power management policies: transactional stalls and aborts. The first proposed policy uses dynamic voltage and frequency scaling (DVFS) during transactional stall periods. By frequency scaling PEs based on their transactional state, DVFS can increase the throughput and energy efficiency of HTMs. The second method uses a transaction's conflict probability to reschedule transactions and clock gate aborted PEs to reduce overall contention and power consumption within the system. The proposed techniques are evaluated using three HTM configurations and are shown to reduce the energy delay squared product (ED2P) of the STAMP and SPLASH-2 benchmarks by an average of 18% when combined. Synthetic workloads are used to explore a wider range of program behaviors and the optimizations are shown to reduce the ED2P by an average of 29%. For a comparison, this work is shown reduce the ED2P by up to 30% relative to previous proposals for energy reduction in HTMs (e.g. transaction serialization).