Pi and the AGM: a study in the analytic number theory and computational complexity
Pi and the AGM: a study in the analytic number theory and computational complexity
FFTs in external or hierarchical memory
The Journal of Supercomputing
Algorithm 693: a FORTRAN package for floating-point multiple-precision arithmetic
ACM Transactions on Mathematical Software (TOMS)
Large integer multiplication on hypercubes
Journal of Parallel and Distributed Computing
Algorithm 719: Multiprecision translation and execution of FORTRAN programs
ACM Transactions on Mathematical Software (TOMS)
Discrete weighted transforms and large-integer arithmetic
Mathematics of Computation
Journal of Symbolic Computation - Special issue on parallel symbolic computation
Pi: a source book
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
High-precision division and square root
ACM Transactions on Mathematical Software (TOMS)
Fast Multiple-Precision Evaluation of Elementary Functions
Journal of the ACM (JACM)
A Blocking Algorithm for Parallel 1-D FFT on Clusters of PCs
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
ICPP '00 Proceedings of the 2000 International Workshop on Parallel Processing
Modern Computer Arithmetic
Schönhage-Strassen algorithm with MapReduce for multiplying terabit integers
Proceedings of the 2011 International Workshop on Symbolic-Numeric Computation
Hi-index | 0.00 |
We present efficient parallel algorithms for multiple-precision arithmetic operations of more than several million decimal digits on distributed-memory parallel computers. A parallel implementation of floating-point real FFT-based multiplication is used, since the key operation for fast multiple-precision arithmetic is multiplication. The operation for releasing propagated carries and borrows in multiple-precision addition, subtraction and multiplication was also parallelized. More than 2.576 trillion decimal digits of @p were computed on 640 nodes of Appro Xtreme-X3 (648 nodes, 147.2GFlops/node, 95.4TFlops peak performance) with a computing elapsed time of 73h 36min which includes the time required for verification.