Efficient high accuracy solutions with GMRES(m)
SIAM Journal on Scientific and Statistical Computing
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
A decade of reconfigurable computing: a visionary retrospective
Proceedings of the conference on Design, automation and test in Europe
A flexible floating-point format for optimizing data-paths and operators in FPGA based DSPs
FPGA '02 Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays
Reconfigurable computing: a survey of systems and software
ACM Computing Surveys (CSUR)
Design, implementation and testing of extended and mixed precision BLAS
ACM Transactions on Mathematical Software (TOMS)
Accuracy and Stability of Numerical Algorithms
Accuracy and Stability of Numerical Algorithms
Logarithmic Number System and Floating-Point Arithmetics on FPGA
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
A Library of Parameterized Floating-Point Modules and Their Use
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Proceedings of the 30th annual international symposium on Computer architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
Exploiting fast hardware floating point in high precision computation
ISSAC '03 Proceedings of the 2003 international symposium on Symbolic and algebraic computation
Algorithms for Quad-Double Precision Floating Point Arithmetic
ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic
Floating Point Unit Generation and Evaluation for FPGAs
FCCM '03 Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
A quantitative analysis of the speedup factors of FPGAs over processors
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Unifying Bit-Width Optimisation for Fixed-Point and Floating-Point Designs
FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
A Comparison of Floating Point and Logarithmic Number Systems for FPGAs
FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Rounding Errors in Algebraic Processes
Rounding Errors in Algebraic Processes
The potential of the cell processor for scientific computing
Proceedings of the 3rd conference on Computing frontiers
Error bounds from extra-precise iterative refinement
ACM Transactions on Mathematical Software (TOMS)
Virtual Embedded Blocks: A Methodology for Evaluating Embedded Elements in FPGAs
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Extended-precision floating-point numbers for GPU computation
ACM SIGGRAPH 2006 Research posters
Implementation of residue number systems on GPUs
ACM SIGGRAPH 2006 Research posters
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Lightweight floating-point arithmetic: case study of inverse discrete cosine transform
EURASIP Journal on Applied Signal Processing
Exploring weak scalability for FEM calculations on a GPU-enhanced cluster
Parallel Computing
Fast recursive filters for simulating nonlinear dynamic systems
Neural Computation
Algorithmic performance studies on graphics processing units
Journal of Parallel and Distributed Computing
Using GPUs to improve multigrid solver performance on a cluster
International Journal of Computational Science and Engineering
Journal of Parallel and Distributed Computing
Concurrent number cruncher: a GPU implementation of a general sparse linear solver
International Journal of Parallel, Emergent and Distributed Systems
Integrated Digital Image Correlation for the Identification of Mechanical Properties
MIRAGE '09 Proceedings of the 4th International Conference on Computer Vision/Computer Graphics CollaborationTechniques
A Particle-Mesh Integrator for Galactic Dynamics Powered by GPGPUs
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU
International Journal of Computational Science and Engineering
A comparison of three parallelisation methods for 2D flood inundation models
Environmental Modelling & Software
State-of-the-art in heterogeneous computing
Scientific Programming
High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster
Journal of Computational Physics
A survey of medical image registration on graphics hardware
Computer Methods and Programs in Biomedicine
Tuning the generation of sobol sequence with owen scrambling
LSSC'09 Proceedings of the 7th international conference on Large-Scale Scientific Computing
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Computer Science - Research and Development
GPU-accelerated asynchronous error correction for mixed precision iterative refinement
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Efficient generation of large-scale pareto-optimal topologies
Structural and Multidisciplinary Optimization
Journal of Computational Physics
Automatically adapting programs for mixed-precision floating-point computation
Proceedings of the 27th international ACM conference on International conference on supercomputing
Accelerated finite element elastodynamic simulations using the GPU
Journal of Computational Physics
Numerical integration on GPUs for higher order finite elements
Computers & Mathematics with Applications
Hi-index | 0.01 |
In this survey paper, we compare native double precision solvers with emulated-and mixed-precision solvers of linear systems of equations as they typically arise in finite element discretisations. The emulation utilises two single float numbers to achieve higher precision, while the mixed precision iterative refinement computes residuals and updates the solution vector in double precision but solves the residual systems in single precision. Both techniques have been known since the 1960s, but little attention has been devoted to their performance aspects. Motivated by changing paradigms in processor technology and the emergence of highly-parallel devices with outstanding single float performance, we adapt the emulation and mixed precision techniques to coupled hardware configurations, where the parallel devices serve as scientific co-processors. The performance advantages are examined with respect to speedups over a native double precision implementation (time aspect) and reduced area requirements for a chip (space aspect). The paper begins with an overview of the theoretical background, algorithmic approaches and suitable hardware architectures. We then employ several conjugate gradient (CG) and multigrid solvers and study their behaviour for different parameter settings of the iterative refinement technique. Concrete speedup factors are evaluated on the coupled hardware configuration of a general-purpose CPU and a graphics processor. The dual performance aspect of potential area savings is assessed on a field programmable gate array (FPGA). In the last part, we test the applicability of the proposed mixed precision schemes with ill-conditioned matrices. We conclude that the mixed precision approach works very well with the parallel co-processors gaining speedup factors of four to five, and area savings of three to four, while maintaining the same accuracy as a reference solver executing everything in double precision.