DFT performance prediction in FFTW

Authors:
Liang Gu;Xiaoming Li
Affiliations:
University of Delaware;University of Delaware
Venue:
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Year:
2009

Citing 8
Cited 1

Fast fourier transforms: a tutorial review and a state of the art

Signal Processing
Analysis of benchmark characteristics and benchmark performance prediction

ACM Transactions on Computer Systems (TOCS)
Discrete-time signal processing (2nd ed.)

Discrete-time signal processing (2nd ed.)
A fast Fourier transform compiler

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Optimizing the performance of sparse matrix-vector multiplication

Optimizing the performance of sparse matrix-vector multiplication
Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy

Proceedings of the international symposium on Code generation and optimization
In search of near-optimal optimization phase orderings

Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Automatic Tuning of Discrete Fourier Transforms Driven by Analytical Modeling

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques

An FFT performance model for optimizing general-purpose processor architecture

Journal of Computer Science and Technology - Special issue on Community Analysis and Information Recommendation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fastest Fourier Transform in the West (FFTW) is an adaptive FFT library that generates highly efficient Discrete Fourier Transform (DFT) implementations. It is one of the fastest FFT libraries available and it outperforms many adaptive or hand-tuned DFT libraries. Its success largely relies on the huge search space spanned by several FFT algorithms and a set of compiler generated C code (called codelets) for small size DFTs. FFTW empirically finds the best algorithm by measuring the performance of different algorithm combinations. Although the empirical search works very well for FFTW, the search process does not explain why the best plan found performs best, and the search overhead grows polynomially as the DFT size increases. The opposite of empirical search is model-driven optimization. However, it is widely believed that model-driven optimization is inferior to empirical search and is particularly powerless to solve problems as complex as the optimization of DFT. In this paper, we propose a model-driven DFT performance predictor that can replace the empirical search engine in FFTW. Our technique adapts to different architectures and automatically predicts the performance of DFT algorithms and codelets (including SIMD codelets). Our experiments show that this technique renders DFT implementations that achieve more than 95% of the performance with the original FFTW and uses less than 5% of the search overhead on four test platforms. More importantly, our models give insight on why different combinations of DFT algorithms perform differently on a processor given its architectural features.