FFTs in external or hierarchical memory
The Journal of Supercomputing
The performance advantages of integrating block data transfer in cache-coherent multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Digital signal processing (3rd ed.): principles, algorithms, and applications
Digital signal processing (3rd ed.): principles, algorithms, and applications
Multi-Carrier Digital Communications: Theory and Applications of Ofdm
Multi-Carrier Digital Communications: Theory and Applications of Ofdm
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors
Proceedings of the 13th international symposium on Low power electronics and design
Parallel FFT Algorithms on Network-on-Chips
ITNG '08 Proceedings of the Fifth International Conference on Information Technology: New Generations
LTE: the evolution of mobile broadband
IEEE Communications Magazine
ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration
Proceedings of the Conference on Design, Automation and Test in Europe
FFT Algorithms Evaluation on a Homogeneous Multi-processor System-on-Chip
ICPPW '10 Proceedings of the 2010 39th International Conference on Parallel Processing Workshops
Hi-index | 0.00 |
In this paper, we propose optimized multicore designs for data parallel Fast Fourier Transform (FFT) applications. FFT is widely used in digital systems as a fundamental algorithm. The implementation of FFT on conventional architectures has been studied. However, the evaluation of data parallel FFT in Network-on-Chip (NoC) platforms has not been well addressed. We analyse data parallel FFT in terms of on-chip traffic patterns. NoC designs optimized for FFT are introduced. Experiments show that, the execution times of our optimized designs are 12.1% and 18.3% shorter than the original NoC design.