Designing efficient algorithms for parallel computers
Designing efficient algorithms for parallel computers
IEEE Transactions on Computers
Performance analysis of the FFT algorithm on a shared-memory parallel architecture
IBM Journal of Research and Development
Parallel programming: techniques and applications using networked workstations and parallel computers
The Design and Analysis of Computer Algorithms
The Design and Analysis of Computer Algorithms
The Scalability of FFT on Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
Wavelength Assignment for Realizing Parallel FFT on Regular Optical Networks
The Journal of Supercomputing
NPC'05 Proceedings of the 2005 IFIP international conference on Network and Parallel Computing
Hi-index | 0.00 |
In this paper, an empirical comparison is made between two parallel implementations of a one-dimensional Fast Fourier transform (FFT) that is targeted for a symmetric multiprocessor (SMP). The paper compares the run time characteristics and overhead (time complexity) associated with the two algorithms with that of previous research. The scalability of the two algorithms is also accessed using the isoefficiency function and the effect of caches on performance is presented. The isoefficiency function is defined as the rate at which the data should be increased with the number of processors to maintain constant efficiency. The two implementations are based on a tree and transpose, respectively. In the tree algorithm, the speedup does not increase linearly with the number of processors, but rather super linear speedup can be achieved for the two processor case. The transpose algorithm obtained (approximately) linearly speedup with respect to the number of processors with only moderate increase in the data size. Additional performance can be obtained by overlapping computation with communication and by efficient use of caches.