Adapting application execution in CMPs using helper threads

  • Authors:
  • Yang Ding;Mahmut Kandemir;Padma Raghavan;Mary Jane Irwin

  • Affiliations:
  • Department of Computer Science & Engineering, Pennsylvania State University, University Park, PA 16802, USA;Department of Computer Science & Engineering, Pennsylvania State University, University Park, PA 16802, USA;Department of Computer Science & Engineering, Pennsylvania State University, University Park, PA 16802, USA;Department of Computer Science & Engineering, Pennsylvania State University, University Park, PA 16802, USA

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In parallel to the changes in both the architecture domain-the move toward chip multiprocessors (CMPs)-and the application domain-the move toward increasingly data-intensive workloads-issues such as performance, energy efficiency and CPU availability are becoming increasingly critical. The CPU availability can change dynamically due to several reasons such as thermal overload, increase in transient errors, or operating system scheduling. An important question in this context is how to adapt, in a CMP, the execution of a given application to CPU availability change at runtime. Our paper studies this problem, targeting the energy-delay product (EDP) as the main metric to optimize. We first discuss that, in adapting the application execution to the varying CPU availability, one needs to consider the number of CPUs to use, the number of application threads to accommodate and the voltage/frequency levels to employ (if the CMP has this capability). We then propose to use helper threads to adapt the application execution to CPU availability change in general with the goal of minimizing the EDP. The helper thread runs parallel to the application execution threads and tries to determine the ideal number of CPUs, threads and voltage/frequency levels to employ at any given point in execution. We illustrate this idea using four applications (Fast Fourier Transform, MultiGrid, LU decomposition and Conjugate Gradient) under different execution scenarios. The results collected through our experiments are very promising and indicate that significant EDP reductions are possible using helper threads. For example, we achieved up to 66.3%, 83.3%, 91.2%, and 94.2% savings in EDP when adjusting all the parameters properly in applications FFT, MG, LU, and CG, respectively. We also discuss how our approach can be extended to address multi-programmed workloads.