A new speculation technique to optimize floating-point performance while preserving bit-by-bit reproducibility

Authors:
Mikio Takeuchi;Hideaki Komatsu;Toshio Nakatani
Affiliations:
IBM Research, Tokyo Research Laboratory, Yamato, Kanagawa, Japan;IBM Research, Tokyo Research Laboratory, Yamato, Kanagawa, Japan;IBM Research, Tokyo Research Laboratory, Yamato, Kanagawa, Japan
Venue:
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Year:
2003

Citing 23
Cited 0

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
What every computer scientist should know about floating-point arithmetic

ACM Computing Surveys (CSUR)
Sentinel scheduling: a model for compiler-controlled speculative execution

ACM Transactions on Computer Systems (TOCS)
The SPARC architecture manual (version 9)

The SPARC architecture manual (version 9)
Dynamic memory disambiguation using the memory conflict buffer

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Unrolling-based optimizations for modulo scheduling

Proceedings of the 28th annual international symposium on Microarchitecture
PA-RISC 2.0 architecture

PA-RISC 2.0 architecture
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Can program profiling support value prediction?

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Value speculation scheduling for high performance processors

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Design, implementation, and evaluation of optimizations in a just-in-time compiler

JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
DIVA: a reliable substrate for deep submicron microarchitecture design

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Escape analysis for Java

Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
From flop to megaflops: Java for technical computing

ACM Transactions on Programming Languages and Systems (TOPLAS)
The Art of Computer Programming Volumes 1-3 Boxed Set

The Art of Computer Programming Volumes 1-3 Boxed Set
The Java Class Libraries Volume 1: java.io, java.lang, java.math, ,java.net, java.text,,java.util

The Java Class Libraries Volume 1: java.io, java.lang, java.math, ,java.net, java.text,,java.util
Java Virtual Machine Specification

Java Virtual Machine Specification
Java Language Specification, Second Edition: The Java Series

Java Language Specification, Second Edition: The Java Series
Optimizing Precision Overhead for x86 Processors

Proceedings of the 2nd Java Virtual Machine Research and Technology Symposium
Stride prefetching by dynamically inspecting objects

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
A region-based compilation technique for a Java just-in-time compiler

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Effectiveness of cross-platform optimizations for a java just-in-time compiler

OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
Overview of the IBM Java just-in-time compiler

IBM Systems Journal

Quantified Score

Hi-index	0.02

Visualization

Abstract

The bit-by-bit reproducibility of floating-point results, which is defined by the IEEE 754 standard, prohibits optimizations such as reassociation and the use of native operations such as fused multiply-add (FMA), and thus it significantly impairs floating-point performance. Recent network-oriented languages such as Java strictly conform to the standard, and thus their numerical computing performance becomes inherently lower than conventional languages.In this paper, we propose a new software technique, called floating-point (FP) speculation, to optimize floating-point performance while preserving the bit-by-bit reproducibility of the results. We execute the fast unsafe code and the slow verification code in parallel. The unsafe code does not wait for the verification code, and is immediately followed by the subsequent code that uses the probable result from the unsafe code assuming the speculation will succeed. The improvement from FP speculation results from this earlier start of the subsequent code.Unlike other speculation techniques, FP speculation does not require any special instructions or hardware support. Rather, it exploits unused floating-point registers and execution units. Therefore it is generally applicable for processor architectures that have sufficient floating-point resources.