Sabrewing: A lightweight architecture for combined floating-point and integer arithmetic

Authors:
Tom M. Bruintjes;Karel H. G. Walters;Sabih H. Gerez;Bert Molenkamp;Gerard J. M. Smit
Affiliations:
University of Twente;University of Twente;University of Twente;University of Twente;University of Twente
Venue:
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Year:
2012

Citing 14
Cited 0

Design of the IBM RISC System/6000 floating-point execution unit

IBM Journal of Research and Development
Decoupling integer execution in superscalar processors

Proceedings of the 28th annual international symposium on Microarchitecture
Comparison of Single- and Dual-Pass Multiply-Add Fused Floating-Point Units

IEEE Transactions on Computers
Computer arithmetic: algorithms and hardware designs

Computer arithmetic: algorithms and hardware designs
Leading Zero Anticipation and Detection A Comparison of Methods

ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic
Mutable Functional Units and Their Applications on Microprocessors

ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
The Vector Floating-Point Unit in a Synergistic Processor Element of a CELL Processor

ARITH '05 Proceedings of the 17th IEEE Symposium on Computer Arithmetic
N-Bit Unsigned Division via N-Bit Multiply-Add

ARITH '05 Proceedings of the 17th IEEE Symposium on Computer Arithmetic
Computer Architecture, Fourth Edition: A Quantitative Approach

Computer Architecture, Fourth Edition: A Quantitative Approach
A New Architecture For Multiple-Precision Floating-Point Multiply-Add Fused Unit Design

ARITH '07 Proceedings of the 18th IEEE Symposium on Computer Arithmetic
Intel® Itanium® floating-point architecture

WCAE '03 Proceedings of the 2003 workshop on Computer architecture education: Held in conjunction with the 30th International Symposium on Computer Architecture
Bulldozer: An Approach to Multithreaded Compute Performance

IEEE Micro
Fermi GF100 GPU Architecture

IEEE Micro
An algorithmic and novel design of a leading zero detector circuit: comparison with logic synthesis

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In spite of the fact that floating-point arithmetic is costly in terms of silicon area, the joint design of hardware for floating-point and integer arithmetic is seldom considered. While components like multipliers and adders can potentially be shared, floating-point and integer units in contemporary processors are practically disjoint. This work presents a new architecture which tightly integrates floating-point and integer arithmetic in a single datapath. It is mainly intended for use in low-power embedded digital signal processors and therefore the following design constraints were important: limited use of pipelining for the convenience of the compiler; maintaining compatibility with existing technology; minimal area and power consumption for applicability in embedded systems. The architecture is tailored to digital signal processing by combining floating-point fused multiply-add and integer multiply-accumulate. It could be deployed in a multi-core system-on-chip designed to support applications with and without dominance of floating-point calculations. The VHDL structural description of this architecture is available for download under BSD license. Besides being configurable at design time, it has been thoroughly checked for IEEE-754 compliance by means of a floating-point test suite originating from the IBM Research Labs. A proof-of-concept has also been implemented using STMicroelectronics 65nm technology. This prototype supports 32-bit signed two's complement integers and 41-bit (8-bit exponent and 32-bit significand) floating-point numbers. Our evaluations show that over 67% energy and 19% area can be saved compared to a reference design in which floating-point and integer arithmetic are implemented separately. The area overhead caused by combining floating-point and integer is less than 5%. Implemented in ST's general-purpose CMOS technology, the design can operate at a frequency of 1.35GHz, while 667MHz can be achieved in low-power CMOS. Considering that the entire datapath is partitioned in just three pipeline stages, and the fact that the design is intended for use in the low-power domain, these frequencies are adequate. They are in fact competitive with current technology low-power floating-point units. Post-layout estimates indicate that the required area of a low-power implementation can be as small as 0.04mm2. Power consumption is on the order of several milliwatts. Strengthened by the fact that clock gating could reduce power consumption even further, we think that a shared floating-point and integer architecture is a good choice for signal processing in low-power embedded systems.