Accuracy and performance of graphics processors: A Quantum Monte Carlo application case study

  • Authors:
  • Jeremy S. Meredith;Gonzalo Alvarez;Thomas A. Maier;Thomas C. Schulthess;Jeffrey S. Vetter

  • Affiliations:
  • Oak Ridge National Laboratory, 1 Bethel Valley Road, MS 6173 Oak Ridge, TN 37831, USA;Oak Ridge National Laboratory, 1 Bethel Valley Road, MS 6173 Oak Ridge, TN 37831, USA;Oak Ridge National Laboratory, 1 Bethel Valley Road, MS 6173 Oak Ridge, TN 37831, USA;Oak Ridge National Laboratory, 1 Bethel Valley Road, MS 6173 Oak Ridge, TN 37831, USA;Oak Ridge National Laboratory, 1 Bethel Valley Road, MS 6173 Oak Ridge, TN 37831, USA

  • Venue:
  • Parallel Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The tradeoffs of accuracy and performance are as yet an unsolved problem when dealing with Graphics Processing Units (GPUs) as a general-purpose computation device. Their high performance and low cost makes them a desirable target for scientific computation, and new language efforts help address the programming challenges of data parallel algorithms and memory management. But the original task of GPUs - real-time rendering - has traditionally kept accuracy as a secondary goal, and sacrifices have sometimes been made as a result. In fact, the widely deployed hardware is generally capable of only single precision arithmetic, and even this accuracy is not necessarily equivalent to that of a commodity CPU. In this paper, we investigate the accuracy and performance characteristics of GPUs, including results from a preproduction double precision-capable GPU. We then accelerate the full Quantum Monte Carlo simulation code DCA++, similarly investigating its tolerance to the precision of arithmetic delivered by GPUs. The results show that while DCA++ has some sensitivity to the arithmetic precision, the single-precision GPU results were comparable to single-precision CPU results. Acceleration of the code on a fully GPU-enabled cluster showed that any remaining inaccuracy in GPU precision was negligible; sufficient accuracy was retained for scientifically meaningful results while still showing significant speedups.