Circuits, Systems, and Signal Processing
Computational frameworks for the fast Fourier transform
Computational frameworks for the fast Fourier transform
Techniques and standards for image, video, and audio coding
Techniques and standards for image, video, and audio coding
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
A fast Fourier transform compiler
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Architecture-cognizant divide and conquer algorithms
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
SPL: a language and compiler for DSP algorithms
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Fast Transforms: Algorithms, Analyses, Applications
Fast Transforms: Algorithms, Analyses, Applications
Stochastic search for signal processing algorithm optimization
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Automatic Performance Tuning in the UHFFT Library
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
A SIMD Vectorizing Compiler for Digital Signal Processing Algorithms
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Short Vector Code Generation for the Discrete Fourier Transform
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Dynamic Data Layouts for Cache-Conscious Factorization of DFT
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Automatic Optimization of DSP Algorithms
Automatic Optimization of DSP Algorithms
In search of the optimal Walsh-Hadamard transform
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 06
Architecture independent short vector FFTs
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Cache conscious Walsh-Hadamard transform
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Automating the modeling and optimization of the performance ofsignal transforms
IEEE Transactions on Signal Processing
Proceedings of the 40th annual Design Automation Conference
Learning to construct fast signal processing implementations
The Journal of Machine Learning Research
Programming by sketching for bit-streaming programs
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Automatic Tuning Matrix Multiplication Performance on Graphics Hardware
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Custom-optimized multiplierless implementations of DSP algorithms
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Performance and environment monitoring for continuous program optimization
IBM Journal of Research and Development
In search of a program generator to implement generic transformations for high-performance computing
Science of Computer Programming - Special issue on the first MetaOCaml workshop 2004
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
LinBox and future high performance computer algebra
Proceedings of the 2007 international workshop on Parallel symbolic computation
Iterative optimization in the polyhedral model: part ii, multidimensional time
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Program optimization carving for GPU computing
Journal of Parallel and Distributed Computing
How to Write Fast Numerical Code: A Small Introduction
Generative and Transformational Techniques in Software Engineering II
An optimizing compiler for parallel chemistry simulations
International Journal of Parallel Programming
A design methodology for domain-optimized power-efficient supercomputing
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
A 32x32x32, spatially distributed 3D FFT in four microseconds on Anton
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Automating the generation of composed linear algebra kernels
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Vectorization techniques for the Blue Gene/L double FPU
IBM Journal of Research and Development
Processor virtualization and split compilation for heterogeneous multicore embedded systems
Proceedings of the 47th Design Automation Conference
Language virtualization for heterogeneous parallel computing
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Auto-tuning of fast fourier transform on graphics processors
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
An efficient evolutionary algorithm for solving incrementally structured problems
Proceedings of the 13th annual conference on Genetic and evolutionary computation
A language for the compact representation of multiple program versions
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Raising the level of abstraction for developing message passing applications
The Journal of Supercomputing
Automatically tuned FFTs for bluegene/l's double FPU
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Deciding where to call performance libraries
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
On domain-specific languages reengineering
GPCE'05 Proceedings of the 4th international conference on Generative Programming and Component Engineering
Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs
Communications of the ACM
Language and compiler support for auto-tuning variable-accuracy algorithms
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Portable Parallel Programs using architecture-aware libraries
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Explicitly heterogeneous metaprogramming with MetaHaskell
Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Siblingrivalry: online autotuning through local competitions
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
KFusion: optimizing data flow without compromising modularity
Proceedings of the 12th annual international conference on Aspect-oriented software development
AutoTune: a plugin-driven approach to the automatic tuning of parallel applications
PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Terra: a multi-stage language for high-performance computing
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning
International Journal of Parallel Programming
Hi-index | 0.02 |
SPIRAL is a generator for libraries of fast software implementations of linear signal processing transforms. These libraries are adapted to the computing platform and can be re-optimized as the hardware is upgraded or replaced. This paper describes the main components of SPIRAL: the mathematical framework that concisely describes signal transforms and their fast algorithms; the formula generator that captures at the algorithmic level the degrees of freedom in expressing a particular signal processing transform; the formula translator that encapsulates the compilation degrees of freedom when translating a specific algorithm into an actual code implementation; and, finally, an intelligent search engine that finds within the large space of alternative formulas and implementations the "best" match to the given computing platform. We present empirical data that demonstrate the high performance of SPIRAL generated code.