Multilinear algebra and parallel programming
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Computational frameworks for the fast Fourier transform
Computational frameworks for the fast Fourier transform
A fast Fourier transform compiler
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Programming pearls: little languages
Communications of the ACM
Generative programming: methods, tools, and applications
Generative programming: methods, tools, and applications
SPL: a language and compiler for DSP algorithms
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Achieving extensibility through product-lines and domain-specific languages: a case study
ACM Transactions on Software Engineering and Methodology (TOSEM)
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Logic, Programming, and PROLOG
Logic, Programming, and PROLOG
Generating Product-Lines of Product-Families
Proceedings of the 17th IEEE international conference on Automated software engineering
Short Vector Code Generation for the Discrete Fourier Transform
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Little language processing, an alternative to courses on compiler construction
ACM SIGCSE Bulletin
The science of deriving dense linear algebra algorithms
ACM Transactions on Mathematical Software (TOMS)
Formal loop merging for signal transforms
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Sparsity: Optimization Framework for Sparse Matrix Kernels
International Journal of High Performance Computing Applications
FFT program generation for shared memory: SMP and multicore
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
How to Write Fast Numerical Code: A Small Introduction
Generative and Transformational Techniques in Software Engineering II
Library generation for linear transforms
Library generation for linear transforms
Sorting networks and their applications
AFIPS '68 (Spring) Proceedings of the April 30--May 2, 1968, spring joint computer conference
Computer Generation of General Size Linear Transform Libraries
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
A rewriting system for the vectorization of signal transforms
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Matrices as arrows!: a biproduct approach to typed linear algebra
MPC'10 Proceedings of the 10th international conference on Mathematics of program construction
Functional and dynamic programming in the design of parallel prefix networks
Journal of Functional Programming
Automatic SIMD vectorization of fast fourier transforms for the larrabee and AVX instruction sets
Proceedings of the international conference on Supercomputing
Computer generation of efficient software viterbi decoders
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Language and compiler support for auto-tuning variable-accuracy algorithms
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Computer generation of streaming sorting networks
Proceedings of the 49th Annual Design Automation Conference
Typing linear algebra: A biproduct-oriented approach
Science of Computer Programming
A Basic Linear Algebra Compiler
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Extensible Recognition of Algorithmic Patterns in DSP Programs for Automatic Parallelization
International Journal of Parallel Programming
Hi-index | 0.00 |
We present the Operator Language (OL), a framework to automatically generate fast numerical kernels. OL provides the structure to extend the program generation system Spiral beyond the transform domain. Using OL, we show how to automatically generate library functionality for the fast Fourier transform and multiple non-transform kernels, including matrix-matrix multiplication, synthetic aperture radar (SAR), circular convolution, sorting networks, and Viterbi decoding. The control flow of the kernels is data-independent, which allows us to cast their algorithms as operator expressions. Using rewriting systems, a structural architecture model and empirical search, we automatically generate very fast C implementations for state-of-the-art multicore CPUs that rival hand-tuned implementations.