High-performance signal processing on reconfigurable platforms

Authors:
Vijaykrishnan Narayanan;W. Kenneth Jenkins;Jung Sub Kim
Affiliations:
The Pennsylvania State University;The Pennsylvania State University;The Pennsylvania State University
Venue:
High-performance signal processing on reconfigurable platforms
Year:
2008

Citing 0
Cited 1

Accurate Area, Time and Power Models for FPGA-Based Implementations

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reconfigurable platforms have become a promising solution for high performance signal processing applications due to their flexible resources, high operating frequency, and low power. However, the required knowledge of such a reconfigurable platform is growing as the complexities to use these platforms continue to increase. This work simplifies the use of reconfigurable platforms for targeted signal processing applications. The first contribution is the design of TANOR, a tool for accelerating applications utilizing numerical algorithms with large-size data sets. TANOR is targeted at solving a range of N-body interaction problems spanning from astrophysics to molecular dynamics. The TANOR design flow starts with a MATLAB description of a particular interaction function and parameter selection from a graphical user interface. Subsequently, TANOR automatically generates a configuration bitstream for a target FPGA along with associated drivers and control software necessary to execute the application on a host computer. Architectural exploration is facilitated through support for standard number representations such as single precision floating point in addition to support for fully custom fixed-point and floating point representations. Moreover, TANOR enables joint exploration of algorithmic and architectural variations in realizing efficient hardware accelerators. TANOR's effectiveness and adaptability is demonstrated by generating kernels for three different applications: namely, the gravitational kernel (used in astrophysics), the Gaussian kernel (common in image processing applications), and a force calculation kernel (applied in molecular dynamics). The results show that TANOR is capable of achieving identical accuracy with lower resource utilization, making it competitive with existing manually-designed custom accelerators. The second contribution focuses on designing an accelerator for Fast Fourier Transforms (FFT), which are used extensively in signal and digital image processing. Since FFT is such a versatile algorithm, much effort has been devoted to enhance its computation efficiency. Of particular interest is the two-dimensional (2D) FFT which is more computation- and bandwidth-intensive than the one-dimensional (1D) FFT. Traditionally, a 2D FFT is computed using Row-Column (RC) decomposition, where 1D FFTs are computed along the rows followed by 1D FFTs along the columns. Both application specific and reconfigurable hardware have been used for high-performance implementations of 2D FFT based applications. However, architectures based on RC decomposition are not efficient for large input size data due to memory bandwidth constraints. To address this problem, an efficient architecture to implement the 2D FFT for large size inputs based on a novel 2D decomposition algorithm is proposed. This architecture achieves very high throughput by exploiting the inherent parallelism due to the 2D decomposition and by overlapping communication with computation. A custom-designed high throughput memory interface block enables maximum utilization of the memory bandwidth. In addition, an automatic system generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex4 and Virtex5 devices. For a 2K×2K input size, the proposed architecture provides a 96.7% improvement compared to RC decomposition based implementation under the same memory constraints, and also outperforms existing 2D FFT implementations. The proposed solutions can serve as the enabling blocks for implementing a wide range of applications that can be solved using Non-Uniform FFT. The first work can be used as the translational phase and second work can be used directly for the FFT computation. Further work is required to optimize and integrate these blocks as a single system.