The fast Fourier transform and its applications
The fast Fourier transform and its applications
FFTs in external or hierarchical memory
The Journal of Supercomputing
Computational frameworks for the fast Fourier transform
Computational frameworks for the fast Fourier transform
SIAM Journal on Scientific Computing
A Blocking Algorithm for Parallel 1-D FFT on Shared-Memory Parallel Computers
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
IBM PowerPC 440 FPU with complex-arithmetic extensions
IBM Journal of Research and Development
Blue Gene/L compute chip: memory and Ethernet subsystem
IBM Journal of Research and Development
Vectorization techniques for the Blue Gene/L double FPU
IBM Journal of Research and Development
Performance measurements of the 3D FFT on the blue gene/l supercomputer
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Implementation of Efficient FFT Algorithms on Fused Multiply- Add Architectures
IEEE Transactions on Signal Processing
Parameter tuning and evaluation of an affinity prediction using protein-protein docking
MMACTEE'08 Proceedings of the 10th WSEAS International Conference on Mathematical Methods and Computational Techniques in Electrical Engineering
PaCT '09 Proceedings of the 10th International Conference on Parallel Computing Technologies
Experience in developing an open source scalable software infrastructure in japan
ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part II
Optimizing 3d convolutions for wavelet transforms on CPUs with SSE units and GPUs
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.00 |
We have developed a high performance 3D convolution library for Protein Docking on IBM Blue Gene. The algorithm is designed to exploit slight locality of memory access in 3D-FFT by making full use of a cache memory structure. The 1D-FFT used in the 3D convolution is optimized for PowerPC 440 FP2 processors. The number of SIMOMD instructions is minimized by simultaneous computation of two 1D-FFTs. The high performance 3D convolution library achieves up to 2.16 Gflops (38.6% of peak) per node. The total performance of a shape complementarity search is estimated at 7 Tflops with the 4-rack Blue Gene system (4096 nodes).