Fast Quadruple Precision Arithmetic Library on Parallel Computer SR11000/J2

Authors:
Takahiro Nagai;Hitoshi Yoshida;Hisayasu Kuroda;Yasumasa Kanada
Affiliations:
Dept. of Frontier Informatics, The University of Tokyo, Tokyo, Japan;Dept. of Frontier Informatics, The University of Tokyo, Tokyo, Japan;Dept. of Frontier Informatics, The University of Tokyo, Tokyo, Japan and The Information Technology Center, The University of Tokyo, Tokyo, Japan;Dept. of Frontier Informatics, The University of Tokyo, Tokyo, Japan and The Information Technology Center, The University of Tokyo, Tokyo, Japan
Venue:
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Year:
2008

Citing 4
Cited 0

The Art of Computer Programming, 2nd Ed. (Addison-Wesley Series in Computer Science and Information

The Art of Computer Programming, 2nd Ed. (Addison-Wesley Series in Computer Science and Information
Algorithms for Quad-Double Precision Floating Point Arithmetic

ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic
A Quadruple Precision and Dual Double Precision Floating-Point Multiplier

DSD '03 Proceedings of the Euromicro Symposium on Digital Systems Design
High-Precision Floating-Point Arithmetic in Scientific Computation

Computing in Science and Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, the fast quadruple precision arithmetic of four kinds of basic operations and multiply-add operations are introduced. The proposed methods provide a maximum speed-up factor of 5 times to gcc 4.1.1 with POWER 5+ processor used on parallel computer SR11000/J2. We also developed the fast quadruple precision vector library optimized on POWER 5 architecture. Quadruple precision numbers, which is 128 bit long double data type, are emulated with a pair of 64 bit double data type on POWER 5+ prosessor used on SR11000/J2 with Hitachi Optimizing Compiler and gcc 4.1.1. To avoid rounding errors in computing quadruple precision arithmetic operations, emulation needs high computational cost. The proposed methods focus on optimizing the number of registers and instruction latency.