Supercompilers for parallel and vector computers
Supercompilers for parallel and vector computers
Circuits, Systems, and Signal Processing
Computational frameworks for the fast Fourier transform
Computational frameworks for the fast Fourier transform
A fast Fourier transform compiler
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
A vectorizing compiler for multimedia extensions
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
Graph-based code selection techniques for embedded processors
ACM Transactions on Design Automation of Electronic Systems (TODAES)
SPL: a language and compiler for DSP algorithms
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Energy aware compilation for DSPs with SIMD instructions
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Automatic Performance Tuning in the UHFFT Library
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
A SIMD Vectorizing Compiler for Digital Signal Processing Algorithms
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Compiling for SIMD Within a Register
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
The Scc Compiler: SWARing at MMX 3DNow!
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Short Vector Code Generation for the Discrete Fourier Transform
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms
International Journal of High Performance Computing Applications
Architecture independent short vector FFTs
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Vectorization techniques for the Blue Gene/L double FPU
IBM Journal of Research and Development
FFT algorithms for vector computers
Parallel Computing
Hi-index | 0.00 |
IBM is currently developing the new line of BlueGene/L supercomputers. The top-of-the-line installation is planned to be a 65,536 processors system featuring a peak performance of 360 Tflop/s. This system is supposed to lead the Top 500 list when being installed in 2005 at the Lawrence Livermore National Laboratory. This paper presents one of the first numerical kernels run on a prototype BlueGene/L machine. We tuned our formal vectorization approach as well as the Vienna MAP vectorizer to support BlueGene/L's custom two-way short vector SIMD “double” floating-point unit and connected the resulting methods to the automatic performance tuning systems Spiral and Fftw. Our approach produces automatically tuned high-performance FFT kernels for BlueGene/L that are up to 45% faster than the best scalar spiral generated code and up to 75% faster than Fftw when run on a single BlueGene/L processor.