Parameterized hardware design on reconfigurable computers: an image processing case study

  • Authors:
  • Miaoqing Huang;Olivier Serres;Tarek El-Ghazawi;Gregory Newby

  • Affiliations:
  • Department of Computer Science and Computer Engineering, University of Arkansas, Fayetteville, AR;Department of Electrical and Computer Engineering, The George Washington University, Washington, DC;Department of Electrical and Computer Engineering, The George Washington University, Washington, DC;Arctic Region Supercomputing Center, University of Alaska Fairbanks, Fairbanks, AK

  • Venue:
  • International Journal of Reconfigurable Computing - Special issue on selected papers from spl 2009 programmable logic and applications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reconfigurable Computers (RCs) with hardware (FPGA) co-processors can achieve significant performance improvement compared with traditional microprocessor (µP)-based computers for many scientific applications. The potential amount of speedup depends on the intrinsic parallelism of the target application as well as the characteristics of the target platform. In this work, we use image processing applications as a case study to demonstrate how hardware designs are parameterized by the co-processor architecture, particularly the data I/O, i.e., the local memory of the FPGA device and the interconnect between the FPGA and the µP. The local memory has to be used by applications that access data randomly. A typical case belonging to this category is image registration. On the other hand, an application such as edge detection can directly read data through the interconnect in a sequential fashion. Two different algorithms of image registration, the exhaustive search algorithm and the Discrete Wavelet Transform (DWT)-based search algorithm, are implemented on hardware, i.e., Xilinx Vertex-IIPro 50 on the Cray XD1 reconfigurable computer. The performance improvements of hardware implementations are 10× and 2×, respectively. Regarding the category of applications that directly access the interconnect, the hardware implementation of Canny edge detection can achieve 544× speedup.