Quad and correctly rounded double precision math functions: portable and optimized for Intel architectures

Authors:
Alexey Ershov;Andrey Naraikin;Sergey Maidanov
Affiliations:
Intel Corporation;Intel Corporation;Intel Corporation
Venue:
Proceedings of the 2006 ACM symposium on Applied computing
Year:
2006

Citing 7
Cited 0

Fast evaluation of elementary mathematical functions with correctly rounded last bit

ACM Transactions on Mathematical Software (TOMS)
Elementary functions: algorithms and implementation

Elementary functions: algorithms and implementation
Multiplications of Floating Point Expansions

ARITH '99 Proceedings of the 14th IEEE Symposium on Computer Arithmetic
Scientific Computing on Itanium-Based Systems

Scientific Computing on Itanium-Based Systems
Algorithms for Quad-Double Precision Floating Point Arithmetic

ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance

Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
Towards the Post-Ultimate libm

ARITH '05 Proceedings of the 17th IEEE Symposium on Computer Arithmetic

Quantified Score

Hi-index	0.00

Visualization

Abstract

A fundamental part of a system's quad floating-point precision support is its companion mathematical library. We developed a hierarchical C macro based methodology for implementing the quad precision elementary functions both portable and optimized for Intel® architectures. When two or three floating-point values natively supported in the hardware are packed together, we are able to leverage the extra precision provided to attain high accuracy and yield measurable performance gains over traditional integer-based implementations. Our high-level language codes are unified for several platforms, while native floating-point arithmetic sequences are the computational elements that underlie the macros and exploit the features of particular architecture. This significantly reduces the library maintenance cost and allows providing high performance quad functions for the new processors. We also show how language extensions in the Intel® C/C++ compiler allow additional performance improvements on Intel® architectures. Finally, our experiments based on recent advances of de Dinechin, Defour and Lauter demonstrate that using methodology developed for quad precision functions we can attain correctly rounded double precision routines with significant performance improvements compared with algorithms based on generic multi-precision packages for a low implementation cost.