SPL: a language and compiler for DSP algorithms
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Stochastic search for signal processing algorithm optimization
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Fast Automatic Generation of DSP Algorithms
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Architecture independent short vector FFTs
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Programming portable optimized multimedia applications
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Retargeting Sequential Image-Processing Programs for Data Parallel Execution
IEEE Transactions on Software Engineering
An Empirical Study On the Vectorization of Multimedia Applications for Multimedia Extensions
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms
International Journal of High Performance Computing Applications
FFT program generation for shared memory: SMP and multicore
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Vectorization techniques for the Blue Gene/L double FPU
IBM Journal of Research and Development
A rewriting system for the vectorization of signal transforms
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Automatically tuned FFTs for bluegene/l's double FPU
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Automatic performance optimization of the discrete fourier transform on distributed memory computers
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Efficient scheduling of recursive control flow on GPUs
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
Short vector SIMD instructions on recent microprocessors, such as SSE on Pentium III and 4, speed up code but are a major challenge to software developers. We present a compiler that automatically generates C code enhanced with short vector instructions for digital signal processing (DSP) transforms, such as the fast Fourier transform (FFT). The input to our compiler is a concise mathematical description of a DSP algorithm in the language SPL. SPL is used in the SPIRAL system (http://www.ece.cmu.edu/~spiral) to generate highly optimized architecture adapted implementations of DSP transforms. Interfacing our compiler with SPIRAL yields speed-ups of more than a factor of 2 in several important cases including the FFT and the discrete cosine transform (DCT) used in the JPEG compression standard. For the FFT our automatically generated code is competitive with the hand-coded Intel Math Kernel Library.