Proceedings of the 2006 ACM/IEEE conference on Supercomputing
ACM Transactions on Mathematical Software (TOMS)
International Journal of Parallel, Emergent and Distributed Systems
Parallel implementation of Cholesky LLT-algorithm in FPGA-based processor
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
VFloat: A Variable Precision Fixed- and Floating-Point Library for Reconfigurable Hardware
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
An error correction solver for linear systems: evaluation of mixed precision implementations
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A tightly coupled accelerator infrastructure for exact arithmetics
ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Automatically adapting programs for mixed-precision floating-point computation
Proceedings of the 27th international ACM conference on International conference on supercomputing
Evaluation of two formulations of the conjugate gradients method with transactional memory
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.00 |
FPGAs are becoming more and more attractive for high precision scientific computations. One of the main problems in efficient resource utilization is the quadratically growing resource usage of multipliers depending on the operand size. Many research efforts have been devoted to the optimization of individual arithmetic and linear algebra operations. In this paper we take a higher level approach and seek to reduce the intermediate computational precision on the algorithmic level by optimizing the accuracy towards the final result of an algorithm. In our case this is the accurate solution of partial differential equations (PDEs). Using the Poisson Problem as a typical PDE example we show that most intermediate operations can be computed with floats or even smaller formats and only very few operations (e.g. 1%) must be performed in double precision to obtain the same accuracy as a full double precision solver. Thus the FPGA can be configured with many parallel float rather than few resource hungry double operations. To achieve this, we adapt the general concept of mixed precision iterative refinement methods to FPGAs and develop a fully pipelined version of the Conjugate Gradient solver. We combine this solver with different iterative refinement schemes and precision combinations to obtain resource efficient mappings of the pipelined algorithm core onto the FPGA.