Thread motion: fine-grained power management for multi-core systems

Authors:
Krishna K. Rangan;Gu-Yeon Wei;David Brooks
Affiliations:
Harvard University/Intel Massachusetts, Cambridge, MA, USA;Harvard University, Cambridge, MA, USA;Harvard University, Cambridge, MA, USA
Venue:
Proceedings of the 36th annual international symposium on Computer architecture
Year:
2009

Citing 17
Cited 34

Pipeline gating: speculation control for energy reduction

Proceedings of the 25th annual international symposium on Computer architecture
Power and CAD considerations for the 1.75mbyte, 1.2ghz L2 cache on the alpha 21364 CPU

Proceedings of the 12th ACM Great Lakes symposium on VLSI
Reducing power density through activity migration

Proceedings of the 2003 international symposium on Low power electronics and design
Characterizing and Predicting Program Behavior and its Variability

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Heat-and-run: leveraging SMT and CMP to manage power density through the operating system

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Coordinated, distributed, formal energy management of chip multiprocessors

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Combined circuit and architectural level variable supply-voltage scaling for low power

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Reducing the Latency and Area Cost of Core Swapping through Shared Helper Engines

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Long-Term Workload Phases: Duration Predictions and Applications to DVFS

IEEE Micro
An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Analysis of dynamic voltage/frequency scaling in chip-multiprocessors

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Trading off Cache Capacity for Reliability to Enable Low Voltage Operation

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture

Characterizing processor thermal behavior

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Proposition for a sequential accelerator in future general-purpose manycore processors and the problem of migration-induced cache misses

Proceedings of the 7th ACM international conference on Computing frontiers
WiDGET: Wisconsin decoupled grid execution tiles

Proceedings of the 37th annual international symposium on Computer architecture
Data marshaling for multi-core architectures

Proceedings of the 37th annual international symposium on Computer architecture
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture
Leveraging the core-level complementary effects of PVT variations to reduce timing emergencies in multi-core processors

Proceedings of the 37th annual international symposium on Computer architecture
Scalable thread scheduling and global power management for heterogeneous many-core architectures

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Optimizing throughput/power trade-offs in hardware transactional memory using DVFS and intelligent scheduling

Proceedings of the international conference on Supercomputing
Deadlock-free fine-grained thread migration

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Scalable power control for many-core architectures running multi-threaded applications

Proceedings of the 38th annual international symposium on Computer architecture
Power profiling and optimization for heterogeneous multi-core systems

ACM SIGARCH Computer Architecture News
System-level application-aware dynamic power management in adaptive pipelined MPSoCs for multimedia

Proceedings of the International Conference on Computer-Aided Design
EmPower: FPGA based emulation of dynamic power management algorithms for multi-core systems on chip (abstract only)

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
EClass: An execution classification approach to improving the energy-efficiency of software via machine learning

Journal of Systems and Software
Pack & Cap: adaptive DVFS and thread packing under power caps

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic adaptive virtual core mapping to improve power, energy, and performance in multi-socket multicores

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
PGCapping: exploiting power gating for power capping and core lifetime balancing in CMPs

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
A column generation approach for power-aware optimization of virtualized heterogeneous server clusters

Computers and Industrial Engineering
Toward on-chip datacenters: a perspective on general trends and on-chip particulars

The Journal of Supercomputing
Moths: Mobile threads for on-chip networks

ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs

ACM Transactions on Architecture and Code Optimization (TACO)
SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Composite Cores: Pushing Heterogeneity Into a Core

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Explicit transient thermal simulation of liquid-cooled 3D ICs

Proceedings of the Conference on Design, Automation and Test in Europe
Architecturally homogeneous power-performance heterogeneous multicore systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Energy-efficient virtual machine scheduling in performance-asymmetric multi-core architectures

Proceedings of the 8th International Conference on Network and Service Management
Dynamic power management for multidomain system-on-chip platforms: An optimal control approach

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
SMT-centric power-aware thread placement in chip multiprocessors

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Trace based phase prediction for tightly-coupled heterogeneous cores

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Energy-efficient work-stealing language runtimes

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Price theory based power management for heterogeneous multi-cores

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
A generalized software framework for accurate and efficient management of performance goals

Proceedings of the Eleventh ACM International Conference on Embedded Software
Dynamic server power capping for enabling data center participation in power markets

Proceedings of the International Conference on Computer-Aided Design
Dynamic Power and Thermal Management of NoC-Based Heterogeneous MPSoCs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dynamic voltage and frequency scaling (DVFS) is a commonly-used power-management scheme that dynamically adjusts power and performance to the time-varying needs of running programs. Unfortunately, conventional DVFS, relying on off-chip regulators, faces limitations in terms of temporal granularity and high costs when considered for future multi-core systems. To overcome these challenges, this paper presents thread motion (TM), a fine-grained power-management scheme for chip multiprocessors (CMPs). Instead of incurring the high cost of changing the voltage and frequency of different cores, TM enables rapid movement of threads to adapt the time-varying computing needs of running applications to a mixture of cores with fixed but different power/performance levels. Results show that for the same power budget, two voltage/frequency levels are sufficient to provide performance gains commensurate to idealized scenarios using per-core voltage control. Thread motion extends workload-based power management into the nanosecond realm and, for a given power budget, provides up to 20% better performance than coarse-grained DVFS.