FFTs in external or hierarchical memory
The Journal of Supercomputing
Computational frameworks for the fast Fourier transform
Computational frameworks for the fast Fourier transform
A fast Fourier transform compiler
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An adaptive software library for fast Fourier transforms
Proceedings of the 14th international conference on Supercomputing
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
A high performance parallel algorithm for 1-D FFT
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Automatic Performance Tuning in the UHFFT Library
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
DFTI---a new interface for Fast Fourier Transform libraries
ACM Transactions on Mathematical Software (TOMS)
FFT program generation for shared memory: SMP and multicore
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Implementation and evaluation of a microthread architecture
Journal of Systems Architecture: the EUROMICRO Journal
PetaBricks: a language and compiler for algorithmic choice
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Autotuning multigrid with PetaBricks
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Task Parallel Scheduling over Multi-core System
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Gather/scatter hardware support for accelerating Fast Fourier Transform
Journal of Systems Architecture: the EUROMICRO Journal
An efficient evolutionary algorithm for solving incrementally structured problems
Proceedings of the 13th annual conference on Genetic and evolutionary computation
Using hybrid parallelism to improve memory use in the Uintah framework
Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery
Language and compiler support for auto-tuning variable-accuracy algorithms
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Adaptive computation of self sorting in-place FFTs on hierarchical memory architectures
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Hi-index | 0.00 |
Increased complexity of memory systems to ameliorate the gap between the speed of processors and memory has made it increasingly harder for compilers to optimize an arbitrary code within a palatable amount of time. With the emergence of multicore (CMP), multiprocessor (SMP) and hybrid shared memory multiprocessor architectures, achieving high e ciency is becoming even more challenging. To address the challenge to achieve high e ciency in performance critical applications, domain speci c frameworks have been developed that aid the compilers in scheduling the computations. We have developed a portable framework for the Fast Fourier Transform (FFT) that achieves high e ciency by automatically adapting to various architectural features. Adapting to parallel architectures by searching through all the combinations of schedules (plans) is an expensive task, even when the search is conducted in parallel. In this paper, we develop heuristics to simplify the generation of better schedules for parallel FFT computations on CMP/SMP systems. We evaluate the performance of OpenMP and PThreads implementations of FFT on a number of latest architectures. The performance of parallel FFT schedules is compared with that of the best plan generated for sequential FFT and the speedup for di erent number of processors is reported. In the end, we also present a performance comparison between the UHFFT and FFTW implementations.