SPL: a language and compiler for DSP algorithms
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Learning to construct fast signal processing implementations
The Journal of Machine Learning Research
FFT program generation for shared memory: SMP and multicore
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Computer Generation of General Size Linear Transform Libraries
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Encyclopedia of Parallel Computing
Encyclopedia of Parallel Computing
Hi-index | 0.00 |
Extracting optimal performance from modern computing platforms has become increasingly difficult over the last few years. The effect is particularly noticeable in computations that are of mathematical nature such as those needed in multimedia processing, communication, control, graphics, and scientific simulations: a straightforward implementation, e.g., in C, is often one or two orders of magnitude slower than the best possible code. The reason is in optimizations that are known to be difficult and often impossible for compilers: parallelization, vectorization, and locality optimizations. On the other hand, many mathematical applications spend most of their runtime in well-defined mathematical kernels such as matrix computations, Fourier transforms, interpolation, coding, and others. Since these are likely to be needed for decades to come, it makes sense to build program generation systems for their automatic production. The input for the generator would be only the algorithm knowledge in a suitable representation and some information about the computing platform. The output of the generator is highly optimized, platform-tuned code. For new platforms, the code is regenerated; for new types of platforms, the generator is expanded rather than rewriting the actual kernel code. With Spiral we have built such a system for the domain of linear transforms. In this talk we give a brief survey on the key techniques underlying Spiral: a domain specific mathematical language, rewriting systems for different forms of parallelization and to compute the so-called recursion step closure to improve locality in recursive code, and the use of machine learning to adapt code at installation time. Spiral-generated code has proven to be as good as, and sometimes faster, than any human-written code. As one example, Spiral has been used to enerate part of Intel's commercial libraries IPP and MKL.