Multi-FFT Vectorization for the Cell Multicore Processor
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
High performance power flow algorithm for symmetrical distribution networks with unbalanced loading
International Journal of Computer Applications in Technology
An implementation of parallel 2-d FFT using intel AVX instructions on multi-core processors
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Hi-index | 0.00 |
In this paper, an implementation of a parallel twodimensional fast Fourier transform (FFT) using short vector SIMD instructions on multi-core processors is proposed. Combination of vectorization and the block twodimensional FFT algorithm is shown to effectively improve performance. We vectorized FFT kernels using Intel's Streaming SIMD Extensions 3 (SSE3) instruction. The performance results for two-dimensional FFTs on multi-core processors are reported. We succeeded in obtaining a performance of over 2.7 GFLOPS on a dual-core Intel Xeon (2.8 GHz, two CPUs, four cores) and over 3.3 GFLOPS on an Intel Core2 Duo E6600 (2.4 GHz, one CPU, two cores) for a 210 脳 210-point FFT.