Area-efficient arithmetic expression evaluation using deeply pipelined floating-point cores

  • Authors:
  • Ronald Scrofano;Ling Zhuo;Viktor K. Prasanna

  • Affiliations:
  • Computer Systems Research Department, The Aerospace Corporation, Segundo, CA and Computer Science Department, University of Southern California, Los Angeles, CA;Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA;Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA

  • Venue:
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, it has become possible to implement floating-point cores on field-programmable gate arrays (FPGAs) to provide acceleration for the myriad applications that require high-performance floating-point arithmetic. To achieve high clock rates, floating-point cores for FPGAs must be deeply pipelined. This deep pipelining makes it difficult to reuse the same floating-point core for a series of dependent computations. However, floating-point cores use a great deal of area, so it is important to use as few of them in an architecture as possible. In this paper, we describe area-efficient architectures and algorithms for arithmetic expression evaluation. Such expression evaluation is necessary in applications from a wide variety of fields, including scientific computing and cognition. The proposed designs effectively hide the pipeline latency of the floating-point cores and use at most two floating-point cores for each type of operator in the expression. While best-suited for particular classes of expressions, the proposed designs can evaluate general expressions as well. Additionally, multiple expressions can be evaluated without reconfiguration. Experimental results show that the areas of our designs increase linearly with the number of types of operations in the expression and that our designs occupy less area and achieve higher throughput than designs generated by a commercial hard-ware compiler.