Communications of the ACM - Special issue on parallelism
Discrete weighted transforms and large-integer arithmetic
Mathematics of Computation
Rapid multiplication modulo the sum and difference of highly composite numbers
Mathematics of Computation
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Algorithms for Quad-Double Precision Floating Point Arithmetic
ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic
Extended-precision floating-point numbers for GPU computation
ACM SIGGRAPH 2006 Research posters
A gmp-based implementation of schönhage-strassen's large integer multiplication algorithm
Proceedings of the 2007 international symposium on Symbolic and algebraic computation
Gpu gems 3
Programming Massively Parallel Processors: A Hands-on Approach
Programming Massively Parallel Processors: A Hands-on Approach
CUDA by Example: An Introduction to General-Purpose GPU Programming
CUDA by Example: An Introduction to General-Purpose GPU Programming
Hi-index | 0.00 |
The Lucas-Lehmer test for Mersenne primality can be efficiently parallelized for GPU-based computation. The gpuLucas project implements an irrational-base discrete weighted transform approach (IBDWT) using balanced-integers, non-power-of-two transforms, and carry-save radix representations. gpuLucas uses the CUDA programming language and requires the double-precision floating point capabilities of recent GPUs. Results show up to 7×speedups over benchmark averages for optimized sequential code and factor-of-two speedups over CUDALucas, another GPU-based Lucas-Lehmer tester developed independently and with a different optimization strategy. This work demonstrates techniques for implementing GPU-based number theoretic algorithms on very large numbers, including fast multiplication, prefix-sum-based carry-propagation, and the use of carry-save arithmetic with balanced integers. The work presents timing profiles of convolution-based integer multiplication based on the IBDWT, in particular for non-power-of-two transformations, and establishes the usefulness of the software as a GPU benchmarking application and as a platform for large-integer and polynomial experimentation.