Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms

Authors:
Markus Püschel;José M. F. Moura;Bryan Singer;Jianxin Xiong;Jeremy Johnson;David Padua;Manuela Veloso;Robert W. Johnson
Affiliations:
Department of Electrical and Computer Engineering Carnegie Mellon University, Pittsburgh, PA 15213-3890, USA;Department of Electrical and Computer Engineering Carnegie Mellon University, Pittsburgh, PA 15213-3890, USA;716 Quiet Pond Ct., Odenton, MD 21113, USA;3315 Digital Computer Laboratory, 1304 W Springfield Ave, Urbana, IL 61801, USA;Department of Computer Science, Drexel University Philadelphia, PA 19104-2875, USA;Department of Computer Science, University of Illinois at Urbana-Champaign 3318 Digital Computer Laboratory, Urbana, IL 61801, USA;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213-3890, USA;3324 21ST Ave. South St. Cloud, MN 56301, USA
Venue:
International Journal of High Performance Computing Applications
Year:
2004

Citing 21
Cited 36

A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures

Circuits, Systems, and Signal Processing
Computational frameworks for the fast Fourier transform

Computational frameworks for the fast Fourier transform
Techniques and standards for image, video, and audio coding

Techniques and standards for image, video, and audio coding
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
A fast Fourier transform compiler

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Architecture-cognizant divide and conquer algorithms

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
SPL: a language and compiler for DSP algorithms

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Fast Transforms: Algorithms, Analyses, Applications

Fast Transforms: Algorithms, Analyses, Applications
Stochastic search for signal processing algorithm optimization

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Automatic Performance Tuning in the UHFFT Library

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
A SIMD Vectorizing Compiler for Digital Signal Processing Algorithms

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Short Vector Code Generation for the Discrete Fourier Transform

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Dynamic Data Layouts for Cache-Conscious Factorization of DFT

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Automatic Optimization of DSP Algorithms

Automatic Optimization of DSP Algorithms
In search of the optimal Walsh-Hadamard transform

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 06
Architecture independent short vector FFTs

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Cache conscious Walsh-Hadamard transform

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Automating the modeling and optimization of the performance ofsignal transforms

IEEE Transactions on Signal Processing

Toward efficient static analysis of finite-precision effects in DSP applications via affine arithmetic modeling

Proceedings of the 40th annual Design Automation Conference
Learning to construct fast signal processing implementations

The Journal of Machine Learning Research
Programming by sketching for bit-streaming programs

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Automatic Tuning Matrix Multiplication Performance on Graphics Hardware

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Custom-optimized multiplierless implementations of DSP algorithms

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Reducing hardware complexity of linear DSP systems by iteratively eliminating two-term common subexpressions

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Performance and environment monitoring for continuous program optimization

IBM Journal of Research and Development
In search of a program generator to implement generic transformations for high-performance computing

Science of Computer Programming - Special issue on the first MetaOCaml workshop 2004
Sketching stencils

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
LinBox and future high performance computer algebra

Proceedings of the 2007 international workshop on Parallel symbolic computation
Iterative optimization in the polyhedral model: part ii, multidimensional time

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Program optimization carving for GPU computing

Journal of Parallel and Distributed Computing
How to Write Fast Numerical Code: A Small Introduction

Generative and Transformational Techniques in Software Engineering II
An optimizing compiler for parallel chemistry simulations

International Journal of Parallel Programming
A design methodology for domain-optimized power-efficient supercomputing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
A 32x32x32, spatially distributed 3D FFT in four microseconds on Anton

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Automating the generation of composed linear algebra kernels

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Vectorization techniques for the Blue Gene/L double FPU

IBM Journal of Research and Development
Processor virtualization and split compilation for heterogeneous multicore embedded systems

Proceedings of the 47th Design Automation Conference
Language virtualization for heterogeneous parallel computing

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Auto-tuning of fast fourier transform on graphics processors

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
An efficient evolutionary algorithm for solving incrementally structured problems

Proceedings of the 13th annual conference on Genetic and evolutionary computation
A language for the compact representation of multiple program versions

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Raising the level of abstraction for developing message passing applications

The Journal of Supercomputing
Automatically tuned FFTs for bluegene/l's double FPU

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Deciding where to call performance libraries

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
On domain-specific languages reengineering

GPCE'05 Proceedings of the 4th international conference on Generative Programming and Component Engineering
Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs

Communications of the ACM
Language and compiler support for auto-tuning variable-accuracy algorithms

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Portable Parallel Programs using architecture-aware libraries

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Explicitly heterogeneous metaprogramming with MetaHaskell

Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Siblingrivalry: online autotuning through local competitions

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
KFusion: optimizing data flow without compromising modularity

Proceedings of the 12th annual international conference on Aspect-oriented software development
AutoTune: a plugin-driven approach to the automatic tuning of parallel applications

PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Terra: a multi-stage language for high-performance computing

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning

International Journal of Parallel Programming

Quantified Score

Hi-index	0.02

Visualization

Abstract

SPIRAL is a generator for libraries of fast software implementations of linear signal processing transforms. These libraries are adapted to the computing platform and can be re-optimized as the hardware is upgraded or replaced. This paper describes the main components of SPIRAL: the mathematical framework that concisely describes signal transforms and their fast algorithms; the formula generator that captures at the algorithmic level the degrees of freedom in expressing a particular signal processing transform; the formula translator that encapsulates the compilation degrees of freedom when translating a specific algorithm into an actual code implementation; and, finally, an intelligent search engine that finds within the large space of alternative formulas and implementations the "best" match to the given computing platform. We present empirical data that demonstrate the high performance of SPIRAL generated code.