Reconfigurable Processor Architectures for Mobile Phones
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
A 333-MHz dual-MAC DSP architecture for next-generation wireless applications
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
VLSID '07 Proceedings of the 20th International Conference on VLSI Design held jointly with 6th International Conference: Embedded Systems
A new IDCT-DFT relationship reducing the IDCT computational cost
IEEE Transactions on Signal Processing
High-speed and low-power split-radix FFT
IEEE Transactions on Signal Processing
Fast multiplication-free QWDCT for DV coding standard
IEEE Transactions on Consumer Electronics
Streaming processors for next-generation mobile imaging applications
IEEE Communications Magazine
A cost-effective architecture for 8×8 two-dimensional DCT/IDCT using direct method
IEEE Transactions on Circuits and Systems for Video Technology
A new hardware-efficient algorithm and architecture for computation of 2-D DCTs on a linear array
IEEE Transactions on Circuits and Systems for Video Technology
A method of estimating coding PSNR using quantized DCT coefficients
IEEE Transactions on Circuits and Systems for Video Technology
New systolic array implementation of the 2-D discrete cosine transform and its inverse
IEEE Transactions on Circuits and Systems for Video Technology
A 100 MHz 2-D 8×8 DCT/IDCT processor for HDTV applications
IEEE Transactions on Circuits and Systems for Video Technology
Journal of Systems Architecture: the EUROMICRO Journal
High throughput DA-based DCT with high accuracy error-compensated adder tree
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A high performance video transform engine by using space-time scheduling strategy
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
This investigation proposes a novel radix-42 algorithm with the low computational complexity of a radix-16 algorithm but the lower hardware requirement of a radix-4 algorithm. The proposed pipeline radix-42 single delay feedback path (R42SDF) architecture adopts a multiplierless radix-4 butterfly structure, based on the specific linear mapping of common factor algorithm (CFA), to support both 256-point fast Fourier transform/inverse fast Fourier transform (FFT/IFFT) and 8 × 8 2-D discrete cosine transform (DCT) modes following with the high efficient feedback shift registers architecture. The segment shift register (SSR) and overturn shift register (OSR) structure are adopted to minimize the register cost for the input reordering and post computation operations in the 8 × 8 2-D DCT mode, respectively. Moreover, the retrenched constant multiplier and eight-folded complex multiplier structures are adopted to decrease the multiplier cost and the coefficient ROM size with the complex conjugate symmetry rule and subexpression elimination technology. To further decrease the chip cost, a finite wordlength analysis is provided to indicate that the proposed architecture only requires a 13-bit internal wordlength to achieve 40-dB signal-to-noise ratio (SNR) performance in 256-point FFT/IFFT modes and high digital video (DV) compression quality in 8 × 8 2-D DCT mode. The comprehensive comparison results indicate that the proposed cost effective reconfigurable design has the smallest hardware requirement and largest hardware utilization among the tested architectures for the FFT/IFFT computation, and thus has the highest cost efficiency. The derivation and chip implementation results show that the proposed pipeline 256-point FFT/IFFT/2-D DCT triple-mode chip consumes 22.37 mW at 100 MHz at 1.2-V supply voltage in TSMC 0.13-µm CMOS process, which is very appropriate for the RSoCs IP of next-generation handheld devices.