Fast Mersenne prime testing on the GPU

  • Authors:
  • Andrew Thall

  • Affiliations:
  • Alma College, Alma, MI

  • Venue:
  • Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Lucas-Lehmer test for Mersenne primality can be efficiently parallelized for GPU-based computation. The gpuLucas project implements an irrational-base discrete weighted transform approach (IBDWT) using balanced-integers, non-power-of-two transforms, and carry-save radix representations. gpuLucas uses the CUDA programming language and requires the double-precision floating point capabilities of recent GPUs. Results show up to 7×speedups over benchmark averages for optimized sequential code and factor-of-two speedups over CUDALucas, another GPU-based Lucas-Lehmer tester developed independently and with a different optimization strategy. This work demonstrates techniques for implementing GPU-based number theoretic algorithms on very large numbers, including fast multiplication, prefix-sum-based carry-propagation, and the use of carry-save arithmetic with balanced integers. The work presents timing profiles of convolution-based integer multiplication based on the IBDWT, in particular for non-power-of-two transformations, and establishes the usefulness of the software as a GPU benchmarking application and as a platform for large-integer and polynomial experimentation.