FFTs in external or hierarchical memory

Authors:
D. H. Bailey
Affiliations:
-
Venue:
The Journal of Supercomputing
Year:
1990

Citing 0
Cited 69

Working sets, cache sizes, and node granularity issues for large-scale multiprocessors

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH multiprocessors

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
RISC microprocessors and scientific computing

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Efficient transposition algorithms for large matrices

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Compute intensity and the FFT

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
The performance advantages of integrating block data transfer in cache-coherent multiprocessors

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A Fortran 90-based multiprecision system

ACM Transactions on Mathematical Software (TOMS)
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A performance evaluation of cluster architectures

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Multiprocessor out-of-core FFTs with distributed memory and parallel disks (extended abstract)

Proceedings of the fifth workshop on I/O in parallel and distributed systems
CALYPSO: a computer algebra library for parallel symbolic computation

PASCO '97 Proceedings of the second international symposium on Parallel symbolic computation
Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A comparison of MPI, SHMEM and cache-coherent shared address space programming models on the SGI Origin2000

ICS '99 Proceedings of the 13th international conference on Supercomputing
High-Performance Radix-2, 3 and 5 Parallel 1-D Complex FFT Algorithms for Distributed-Memory Parallel Computers

The Journal of Supercomputing
Accelerating shared virtual memory via general-purpose network interface support

ACM Transactions on Computer Systems (TOCS)
A high performance parallel algorithm for 1-D FFT

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Communication and memory requirements as the basis for mapping task and data parallel programs

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
A Comparison of MPI, SHMEM and Cache-Coherent Shared Address Space Programming Models on a Tightly-Coupled Multiprocessors

International Journal of Parallel Programming
The Scalability of FFT on Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
A Blocking Algorithm for FFT on Cache-Based Processors

HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
A Parallel 3-D FFT Algorithm on Clusters of Vector SMPs

PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
A Blocking Algorithm for Parallel 1-D FFT on Shared-Memory Parallel Computers

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
A Blocking Algorithm for Parallel 1-D FFT on Clusters of PCs

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
CableS: Thread Control and Memory System Extensions for Shared Virtual Memory Clusters

WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Overlapped Four-Step FFT Computation

ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
On the primality of n! ± 1 and 2×3×5×...× p±1

Mathematics of Computation
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Using System Emulation to Model Next-Generation Shared Virtual Memory Clusters

Cluster Computing
A parallel 1-D FFT algorithm for the Hitachi SR8000

Parallel Computing
Efficient 2D FFT implementation on mediaprocessors

Parallel Computing
Shared virtual memory clusters: bridging the cost-performance gap between SMPs and hardware DSM systems

Journal of Parallel and Distributed Computing
Performance characteristics of the Cray X1 and their implications for application performance tuning

Proceedings of the 18th annual international conference on Supercomputing
High Performance FFT Algorithms for Cache-Coherent Multiprocessors

International Journal of High Performance Computing Applications
Layout transformation support for the disk resident arrays framework

The Journal of Supercomputing
Wavelength Assignment for Realizing Parallel FFT on Regular Optical Networks

The Journal of Supercomputing
FFT program generation for shared memory: SMP and multicore

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Five-step FFT algorithm with reduced computational complexity

Information Processing Letters
Scheduling FFT computation on SMP and multicore systems

Proceedings of the 21st annual international conference on Supercomputing
A gmp-based implementation of schönhage-strassen's large integer multiplication algorithm

Proceedings of the 2007 international symposium on Symbolic and algebraic computation
Efficient parallel out-of-core matrix transposition

International Journal of High Performance Computing and Networking
High performance discrete Fourier transforms on graphics processors

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Efficient Adaptive Algorithms for Transposing Small and Large Matrices on Symmetric Multiprocessors

Informatica
A cache-friendly truncated FFT

Theoretical Computer Science
Bsp2omp: A Compiler For Translating Bsp Programs To Openmp

International Journal of Parallel, Emergent and Distributed Systems - Advances in Parallel and Distributed Computational Models
Parallel implementations of 1-D fast Fourier transform without interprocessor communication

International Journal of Computers and Applications
A vector-parallel FFT with a user-specifiable data distribution scheme

ISPA'03 Proceedings of the 2003 international conference on Parallel and distributed processing and applications
An OpenMP implementation of parallel FFT and its performance on IA-64 processors

WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
An implementation of parallel 1-D FFT using SSE3 instructions on dual-core processors

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Parallel implementation of multiple-precision arithmetic and 2,576,980,370,000 decimal digits of π calculation

Parallel Computing
An in-place truncated fourier transform and applications to polynomial multiplication

Proceedings of the 2010 International Symposium on Symbolic and Algebraic Computation
Pricing algorithms for financial derivatives

Algorithms and theory of computation handbook
Overlapping Methods of All-to-All Communication and FFT Algorithms for Torus-Connected Massively Parallel Supercomputers

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Data race avoidance and replay scheme for developing and debugging parallel programs on distributed shared memory systems

Parallel Computing
Process scheduling for future multicore processors

Proceedings of the Fifth International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip
Cache-Oblivious Algorithms

ACM Transactions on Algorithms (TALG)
A minimal average accessing time scheduler for multicore processors

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
A hybrid MPI/OpenMP implementation of a parallel 3-d FFT on SMP clusters

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
High performance computing for a financial application using fast fourier transform

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
An implementation of parallel 3-d FFT using short vector SIMD instructions on clusters of PCs

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Efficient layout transformation for disk-based multidimensional arrays

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Exploration of heuristic scheduling algorithms for 3D multicore processors

Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems
A greedy heuristic approximation scheduling algorithm for 3d multicore processors

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Cache-conscious scheduling of streaming applications

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
A transpose-free in-place SIMD optimized FFT

ACM Transactions on Architecture and Code Optimization (TACO)
A framework for low-communication 1-D FFT

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
High performance 3D convolution for protein docking on IBM blue gene

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Optimized multicore architectures for data parallel fast Fourier transform

Proceedings of the 14th International Conference on Computer Systems and Technologies
A framework for low-communication 1-D FFT

Scientific Programming - Selected Papers from Super Computing 2012

Quantified Score

Hi-index	0.00

FFTs in external or hierarchical memory

Quantified Score

Visualization

Abstract