Working sets, cache sizes, and node granularity issues for large-scale multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH multiprocessors
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
RISC microprocessors and scientific computing
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Efficient transposition algorithms for large matrices
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
The performance advantages of integrating block data transfer in cache-coherent multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A Fortran 90-based multiprecision system
ACM Transactions on Mathematical Software (TOMS)
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A performance evaluation of cluster architectures
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Multiprocessor out-of-core FFTs with distributed memory and parallel disks (extended abstract)
Proceedings of the fifth workshop on I/O in parallel and distributed systems
CALYPSO: a computer algebra library for parallel symbolic computation
PASCO '97 Proceedings of the second international symposium on Parallel symbolic computation
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ICS '99 Proceedings of the 13th international conference on Supercomputing
The Journal of Supercomputing
Accelerating shared virtual memory via general-purpose network interface support
ACM Transactions on Computer Systems (TOCS)
A high performance parallel algorithm for 1-D FFT
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Communication and memory requirements as the basis for mapping task and data parallel programs
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
International Journal of Parallel Programming
The Scalability of FFT on Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
A Blocking Algorithm for FFT on Cache-Based Processors
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
A Parallel 3-D FFT Algorithm on Clusters of Vector SMPs
PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
A Blocking Algorithm for Parallel 1-D FFT on Shared-Memory Parallel Computers
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
A Blocking Algorithm for Parallel 1-D FFT on Clusters of PCs
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
CableS: Thread Control and Memory System Extensions for Shared Virtual Memory Clusters
WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Overlapped Four-Step FFT Computation
ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
On the primality of n! ± 1 and 2×3×5×...× p±1
Mathematics of Computation
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
A parallel 1-D FFT algorithm for the Hitachi SR8000
Parallel Computing
Efficient 2D FFT implementation on mediaprocessors
Parallel Computing
Journal of Parallel and Distributed Computing
Performance characteristics of the Cray X1 and their implications for application performance tuning
Proceedings of the 18th annual international conference on Supercomputing
High Performance FFT Algorithms for Cache-Coherent Multiprocessors
International Journal of High Performance Computing Applications
Layout transformation support for the disk resident arrays framework
The Journal of Supercomputing
Wavelength Assignment for Realizing Parallel FFT on Regular Optical Networks
The Journal of Supercomputing
FFT program generation for shared memory: SMP and multicore
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Five-step FFT algorithm with reduced computational complexity
Information Processing Letters
Scheduling FFT computation on SMP and multicore systems
Proceedings of the 21st annual international conference on Supercomputing
A gmp-based implementation of schönhage-strassen's large integer multiplication algorithm
Proceedings of the 2007 international symposium on Symbolic and algebraic computation
Efficient parallel out-of-core matrix transposition
International Journal of High Performance Computing and Networking
High performance discrete Fourier transforms on graphics processors
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A cache-friendly truncated FFT
Theoretical Computer Science
Bsp2omp: A Compiler For Translating Bsp Programs To Openmp
International Journal of Parallel, Emergent and Distributed Systems - Advances in Parallel and Distributed Computational Models
Parallel implementations of 1-D fast Fourier transform without interprocessor communication
International Journal of Computers and Applications
A vector-parallel FFT with a user-specifiable data distribution scheme
ISPA'03 Proceedings of the 2003 international conference on Parallel and distributed processing and applications
An OpenMP implementation of parallel FFT and its performance on IA-64 processors
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
An implementation of parallel 1-D FFT using SSE3 instructions on dual-core processors
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
An in-place truncated fourier transform and applications to polynomial multiplication
Proceedings of the 2010 International Symposium on Symbolic and Algebraic Computation
Pricing algorithms for financial derivatives
Algorithms and theory of computation handbook
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Process scheduling for future multicore processors
Proceedings of the Fifth International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip
ACM Transactions on Algorithms (TALG)
A minimal average accessing time scheduler for multicore processors
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
A hybrid MPI/OpenMP implementation of a parallel 3-d FFT on SMP clusters
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
High performance computing for a financial application using fast fourier transform
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
An implementation of parallel 3-d FFT using short vector SIMD instructions on clusters of PCs
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Efficient layout transformation for disk-based multidimensional arrays
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Exploration of heuristic scheduling algorithms for 3D multicore processors
Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems
A greedy heuristic approximation scheduling algorithm for 3d multicore processors
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Cache-conscious scheduling of streaming applications
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
A transpose-free in-place SIMD optimized FFT
ACM Transactions on Architecture and Code Optimization (TACO)
A framework for low-communication 1-D FFT
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
High performance 3D convolution for protein docking on IBM blue gene
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Optimized multicore architectures for data parallel fast Fourier transform
Proceedings of the 14th International Conference on Computer Systems and Technologies
A framework for low-communication 1-D FFT
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |