Fast evaluation of elementary mathematical functions with correctly rounded last bit
ACM Transactions on Mathematical Software (TOMS)
Elementary functions: algorithms and implementation
Elementary functions: algorithms and implementation
Multiplications of Floating Point Expansions
ARITH '99 Proceedings of the 14th IEEE Symposium on Computer Arithmetic
Scientific Computing on Itanium-Based Systems
Scientific Computing on Itanium-Based Systems
Algorithms for Quad-Double Precision Floating Point Arithmetic
ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
Towards the Post-Ultimate libm
ARITH '05 Proceedings of the 17th IEEE Symposium on Computer Arithmetic
Hi-index | 0.00 |
A fundamental part of a system's quad floating-point precision support is its companion mathematical library. We developed a hierarchical C macro based methodology for implementing the quad precision elementary functions both portable and optimized for Intel® architectures. When two or three floating-point values natively supported in the hardware are packed together, we are able to leverage the extra precision provided to attain high accuracy and yield measurable performance gains over traditional integer-based implementations. Our high-level language codes are unified for several platforms, while native floating-point arithmetic sequences are the computational elements that underlie the macros and exploit the features of particular architecture. This significantly reduces the library maintenance cost and allows providing high performance quad functions for the new processors. We also show how language extensions in the Intel® C/C++ compiler allow additional performance improvements on Intel® architectures. Finally, our experiments based on recent advances of de Dinechin, Defour and Lauter demonstrate that using methodology developed for quad precision functions we can attain correctly rounded double precision routines with significant performance improvements compared with algorithms based on generic multi-precision packages for a low implementation cost.