Hardware Efficient Fast Computation of the Discrete Fourier Transform
Journal of VLSI Signal Processing Systems
VLSI implementation of programmable FFT architectures for OFDM communication system
Proceedings of the 2006 international conference on Wireless communications and mobile computing
Calculation scheme based on a weighted primitive: application to image processing transforms
EURASIP Journal on Applied Signal Processing
A reconfigurable systolic array architecture for multicarrier wireless and multirate applications
International Journal of Reconfigurable Computing
Pipeline architectures for radix-2 new Mersenne number transform
IEEE Transactions on Circuits and Systems Part I: Regular Papers - Special section on 2008 custom integrated circuits conference (CICC 2008)
Journal of Signal Processing Systems
Improvement of image transform calculation based on a weighted primitive
ICIAR'06 Proceedings of the Third international conference on Image Analysis and Recognition - Volume Part I
A high performance video transform engine by using space-time scheduling strategy
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 35.68 |
In this paper, we propose two new VLSI architectures for computing the N-point discrete Fourier transform (DFT) and its inverse (IDFT) based on a radix-2 fast algorithm, where N is a power of two. The first part of this work presents a linear systolic array that requires log2 N complex multipliers and is able to provide a throughput of one transform sample per clock cycle. Compared with other related systolic designs based on direct computation or a radix-2 fast algorithm, the proposed one has the same throughput performance but involves less hardware complexity. This design is suitable for high-speed real-time applications, but it would not be easily realized in a single chip when N gets large. To balance the chip area and the processing speed, we further present a new reduced-complexity design for the DFT/IDFT computation. The alternative design is a memory-based architecture that consists of one complex multiplier, two complex adders, and some special memory units. The new design has the capability of computing one transform sample every log2 N+1 clock cycles on average. In comparison with the first design, the second design reaches a lower throughput with less hardware complexity. As N=512, the chip area required for the memory-based design is about 5742×5222 μm2, and the corresponding throughput can attain a rate as high as 4M transform samples per second under 0.6 μm CMOS technology. Such area-time performance makes this design very competitive for use in long-length DFT applications, such as asymmetric digital subscriber lines (ADSL) and orthogonal frequency-division multiplexing (OFDM) systems