Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
The input/output complexity of sorting and related problems
Communications of the ACM
Discrete-time signal processing
Discrete-time signal processing
Fast fourier transforms: a tutorial review and a state of the art
Signal Processing
Factorization method for crystallographic Fourier transforms
Advances in Applied Mathematics
A framework for generating distributed-memory parallel programs for block recursive algorithms
Journal of Parallel and Distributed Computing
An analysis of dag-consistent distributed shared-memory algorithms
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Using C++ template metaprograms
C++ gems
ACM Computing Surveys (CSUR)
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Advanced compiler design and implementation
Advanced compiler design and implementation
SPL: a language and compiler for DSP algorithms
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
The nofib Benchmark Suite of Haskell Programs
Proceedings of the 1992 Glasgow Workshop on Functional Programming
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
The Fastest Fourier Transform in the West
The Fastest Fourier Transform in the West
Optimizing Matrix Multiply using PHiPAC: a Portable,High-Performance, ANSI C Coding Methodology
Optimizing Matrix Multiply using PHiPAC: a Portable,High-Performance, ANSI C Coding Methodology
Automatically Tuned Linear Algebra Software
Automatically Tuned Linear Algebra Software
Optimizing the performance of sparse matrix-vector multiplication
Optimizing the performance of sparse matrix-vector multiplication
Architecture independent short vector FFTs
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Automatic generation of prime length FFT programs
IEEE Transactions on Signal Processing
Software assumptions failure tolerance: role, strategies, and visions
Architecting dependable systems VII
Auto-generation and auto-tuning of 3D stencil codes on GPU clusters
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
The FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the performance-critical code was generated automatically by a special-purpose compiler, called genfft, that outputs C code. Written in Objective Caml, genfft can produce DFT programs for any input length, and it can specialize the DFT program for the common case where the input data are real instead of complex. Unexpectedly, genfft "discovered" algorithms that were previously unknown, and it was able to reduce the arithmetic complexity of some other existing algorithms. This paper describes the internals of this special-purpose compiler in some detail, and it argues that a specialized compiler is a valuable tool.