The Hartley transform
Systematic design approaches for algorithmically specified systolic arrays
Computer architecture
Systolic arrays for multidimensional discrete transforms
The Journal of Supercomputing
Discrete cosine transform: algorithms, advantages, applications
Discrete cosine transform: algorithms, advantages, applications
Low Latency Time CORDIC Algorithms
IEEE Transactions on Computers - Special issue on computer arithmetic
An Adaptation of the Fast Fourier Transform for Parallel Processing
Journal of the ACM (JACM)
VLSI Signal Processing Systems
VLSI Signal Processing Systems
A VLSI Constant Geometry Architecture for the Fast Hartley and Fourier Transforms
IEEE Transactions on Parallel and Distributed Systems
FFTs on mesh connected computers
Parallel Computing
Mapping of Trellises Associated with General Encodersonto High-Performance VLSI Architectures
Journal of VLSI Signal Processing Systems
An Efficient Architecture for the In-Place Fast Cosine Transform
Journal of VLSI Signal Processing Systems
An efficient architecture for the in place fast cosine transform
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Hi-index | 0.00 |
We present an unified parallel architecture for four of the most important fast orthogonal transforms with trigonometric kernel: Complex Valued Fourier (CFFT), Real Valued Fourier (RFFT), Hartley (FHT), and Cosine (FCT). Out of these, only the CFFT has a data flow coinciding with the one generated by the successive doubling method, which can be transformed on a constant geometry flow using perfect unshuffle or shuffle permutations. The other three require some type of hardware modification to guarantee the constant geometry of the successive doubling method. We have defined a generalized processing section (PS), based on a circular CORDIC rotator, for the four transforms. This PS section permits the evaluation of the CFFT and FCT transforms in n data recirculations and the RFFT and FHT transforms in n-1 data recirculations, with n being the number of stages of a transform of length N=r/sup n/. Also, the efficiency of the partitioned parallel architecture is optimum because there is no cycle loss in the systolic computation of all the butterflies for each of the four transforms.