Adapting application execution in CMPs using helper threads

Authors:
Yang Ding;Mahmut Kandemir;Padma Raghavan;Mary Jane Irwin
Affiliations:
Department of Computer Science & Engineering, Pennsylvania State University, University Park, PA 16802, USA;Department of Computer Science & Engineering, Pennsylvania State University, University Park, PA 16802, USA;Department of Computer Science & Engineering, Pennsylvania State University, University Park, PA 16802, USA;Department of Computer Science & Engineering, Pennsylvania State University, University Park, PA 16802, USA
Venue:
Journal of Parallel and Distributed Computing
Year:
2009

Citing 34
Cited 2

Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Drowsy caches: simple techniques for reducing leakage power

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
OpenMP: An Industry-Standard API for Shared-Memory Programming

IEEE Computational Science & Engineering
Design Challenges of Technology Scaling

IEEE Micro
Compile/Run-Time Support for Thread Migration

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Phase tracking and prediction

Proceedings of the 30th annual international symposium on Computer architecture
Reducing power density through activity migration

Proceedings of the 2003 international symposium on Low power electronics and design
Exploiting Choice in Resizable Cache Design to Optimize Deep-Submicron Processor Energy-Delay

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Comparing Program Phase Detection Techniques

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Power-aware communication optimization for networks-on-chips with voltage scalable links

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Heat-and-run: leveraging SMT and CMP to manage power density through the operating system

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Adaptive execution techniques for SMT multiprocessor architectures

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting the Cache Capacity of a Single-Chip Multi-Core Processor with Execution Migration

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Design and analysis of an NoC architecture from performance, reliability and energy perspective

Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Performance implications of single thread migration on a chip multi-core

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Migration in Single Chip Multiprocessors

IEEE Computer Architecture Letters
Compiler-Directed Power Density Reduction in NoC-Based Multi-Core Designs

ISQED '06 Proceedings of the 7th International Symposium on Quality Electronic Design
An ILP based approach to address code generation for digital signal processors

GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
Chip multiprocessing and the cell broadband engine

Proceedings of the 3rd conference on Computing frontiers
POWER5 System microarchitecture

IBM Journal of Research and Development - POWER5 and packaging
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Online power-performance adaptation of multithreaded programs using hardware event-based prediction

Proceedings of the 20th annual international conference on Supercomputing
An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Reparallelization and Migration of OpenMP Programs

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Analyzing the Energy-Time Trade-Off in High-Performance Computing Applications

IEEE Transactions on Parallel and Distributed Systems
Enabling scalability and performance in a large scale CMP environment

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Thousand core chips: a technology perspective

Proceedings of the 44th annual Design Automation Conference
Dynamic voltage frequency scaling for multi-tasking systems using online learning

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
A Framework for Providing Quality of Service in Chip Multi-Processors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems

Parallelism orchestration using DoPE: the degree of parallelism executive

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Parcae: a system for flexible parallel execution

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In parallel to the changes in both the architecture domain-the move toward chip multiprocessors (CMPs)-and the application domain-the move toward increasingly data-intensive workloads-issues such as performance, energy efficiency and CPU availability are becoming increasingly critical. The CPU availability can change dynamically due to several reasons such as thermal overload, increase in transient errors, or operating system scheduling. An important question in this context is how to adapt, in a CMP, the execution of a given application to CPU availability change at runtime. Our paper studies this problem, targeting the energy-delay product (EDP) as the main metric to optimize. We first discuss that, in adapting the application execution to the varying CPU availability, one needs to consider the number of CPUs to use, the number of application threads to accommodate and the voltage/frequency levels to employ (if the CMP has this capability). We then propose to use helper threads to adapt the application execution to CPU availability change in general with the goal of minimizing the EDP. The helper thread runs parallel to the application execution threads and tries to determine the ideal number of CPUs, threads and voltage/frequency levels to employ at any given point in execution. We illustrate this idea using four applications (Fast Fourier Transform, MultiGrid, LU decomposition and Conjugate Gradient) under different execution scenarios. The results collected through our experiments are very promising and indicate that significant EDP reductions are possible using helper threads. For example, we achieved up to 66.3%, 83.3%, 91.2%, and 94.2% savings in EDP when adjusting all the parameters properly in applications FFT, MG, LU, and CG, respectively. We also discuss how our approach can be extended to address multi-programmed workloads.