Optimization of power consumption in the iterative solution of sparse linear systems on graphics processors

  • Authors:
  • Hartwig Anzt;Maribel Castillo;Juan C. Fernández;Vincent Heuveline;Francisco D. Igual;Rafael Mayo;Enrique S. Quintana-Ortí

  • Affiliations:
  • Institute for Applied and Numerical Mathematics 4, Karlsruhe Institute of Technology, Karlsruhe, Germany 76133;Depto. de Ingeniería y Ciencia de Computadores, Universidad Jaume I, Castellón, Spain 12.071;Depto. de Ingeniería y Ciencia de Computadores, Universidad Jaume I, Castellón, Spain 12.071;Institute for Applied and Numerical Mathematics 4, Karlsruhe Institute of Technology, Karlsruhe, Germany 76133;Depto. de Ingeniería y Ciencia de Computadores, Universidad Jaume I, Castellón, Spain 12.071;Depto. de Ingeniería y Ciencia de Computadores, Universidad Jaume I, Castellón, Spain 12.071;Depto. de Ingeniería y Ciencia de Computadores, Universidad Jaume I, Castellón, Spain 12.071

  • Venue:
  • Computer Science - Research and Development
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we analyze the power consumption of different GPU-accelerated iterative solver implementations enhanced with energy-saving techniques. Specifically, while conducting kernel calls on the graphics accelerator, we manually set the host system to a power-efficient idle-wait status so as to leverage dynamic voltage and frequency control. While the usage of iterative refinement combined with mixed precision arithmetic often improves the execution time of an iterative solver on a graphics processor, this may not necessarily be true for the power consumption as well. To analyze the trade-off between computation time and power consumption we compare a plain GMRES solver and its preconditioned variant to the mixed-precision iterative refinement implementations based on the respective solvers. Benchmark experiments conclusively reveal how the usage of idle-wait during GPU-kernel calls effectively leverages the power-tools provided by hardware, and improves the energy performance of the algorithm.