A pipelined-loop-compatible architecture and algorithm to reduce variable-length sets of floating-point data on a reconfigurable computer

  • Authors:
  • Gerald R. Morris;Viktor K. Prasanna

  • Affiliations:
  • U.S. Army Engineer Research and Development Center, Major Shared Resource Center, 3909 Halls Ferry Road, Vicksburg, MS 39180, United States;University of Southern California, Department of Electrical Engineering, 3740 McClintock Avenue, Los Angeles, CA 90089, United States

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reconfigurable computers (RCs) combine general-purpose processors (GPPs) with field programmable gate arrays (FPGAs). The FPGAs are reconfigured at run time to become application-specific processors that collaborate with the GPPs to execute the application. High-level language (HLL) to hardware description language (HDL) compilers allow the FPGA-based kernels to be generated using HLL-based programming rather than HDL-based hardware design. Unfortunately, the loops needed for floating-point reduction operations often cannot be pipelined by these HLL-HDL compilers. This capability gap prevents the development of a number of important FPGA-based kernels. This article describes a novel architecture and algorithm that allow the use of an HLL-HDL environment to implement high-performance FPGA-based kernels that reduce multiple, variable-length sets of floating-point data. A sparse matrix iterative solver is used to demonstrate the effectiveness of the reduction kernel. The FPGA-augmented version running on a contemporary RC is up to 2.4 times faster than the software-only version of the same solver running on the GPP. Conservative estimates show the solver will run up to 6.3 times faster than software on a next-generation RC.