Working sets, cache sizes, and node granularity issues for large-scale multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Efficient transposition algorithms for large matrices
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Index Transformation Algorithms in a Linear Algebra Framework
IEEE Transactions on Parallel and Distributed Systems
The Block Distributed Memory Model
IEEE Transactions on Parallel and Distributed Systems
CALYPSO: a computer algebra library for parallel symbolic computation
PASCO '97 Proceedings of the second international symposium on Parallel symbolic computation
A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution
IEEE Transactions on Parallel and Distributed Systems
Development of a mathematical subroutine library for Fujitsu vector parallel processors
ICS '98 Proceedings of the 12th international conference on Supercomputing
Efficient Methods for kr → r and r → kr Array Redistribution1
The Journal of Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Adapted diameters and the efficient computation of Fourier transforms on finite groups
Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
Design and Performance Evaluation of a Portable Parallel Library for Space-Time Adaptive Processing
IEEE Transactions on Parallel and Distributed Systems
An adaptive software library for fast Fourier transforms
Proceedings of the 14th international conference on Supercomputing
Multithreaded algorithms for the fast Fourier transform
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
The Journal of Supercomputing
A Framework for the Design and Implementation of FFT Permutation Algorithms
IEEE Transactions on Parallel and Distributed Systems
Efficient Methods for Multi-Dimensional Array Redistribution
The Journal of Supercomputing
A Generalized Basic-Cycle Calculation Method for Efficient Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
SPL: a language and compiler for DSP algorithms
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
A functional approach to radix-r FFTS
Progress in computer research
A Generalized Processor Mapping Technique for Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
Automatic derivation and implementation of signal processing algorithms
ACM SIGSAM Bulletin
A functional approach to radix-r FFTS
Progress in computer research
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Communication and memory requirements as the basis for mapping task and data parallel programs
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Journal of Scientific Computing
The FFT: An Algorithm the Whole Family Can Use
Computing in Science and Engineering
The Scalability of FFT on Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Parallel multiplication of a vector by a kronecker product of matrices
Parallel numerical linear algebra
Automatic Performance Tuning in the UHFFT Library
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Semi-automatic Generation of Web-Based Computing Environments for Software Libraries
ICCS '02 Proceedings of the International Conference on Computational Science-Part I
A Blocking Algorithm for FFT on Cache-Based Processors
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Searching for the Best FFT Formulas with the SPL Compiler
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
A Parallel 3-D FFT Algorithm on Clusters of Vector SMPs
PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
A Blocking Algorithm for Parallel 1-D FFT on Shared-Memory Parallel Computers
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Quantum Wavelet Transforms: Fast Algorithms and Complete Circuits
QCQC '98 Selected papers from the First NASA International Conference on Quantum Computing and Quantum Communications
Performance of High-Accuracy PDE Solvers on a Self-Optimizing NUMA Architecture
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
A Blocking Algorithm for Parallel 1-D FFT on Clusters of PCs
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Overlapped Four-Step FFT Computation
ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
Concurrent Error Detection in Fast Unitary Transform Algorithms
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Parallel Performance of a 3D Elliptic Solver
NAA '00 Revised Papers from the Second International Conference on Numerical Analysis and Its Applications
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
High accuracy periodic solutions to the Sivashinsky equation
Journal of Computational Physics
Efficient metacomputing of elliptic linear and non-linear problems
Journal of Parallel and Distributed Computing - Special issue on computational grids
A parallel 1-D FFT algorithm for the Hitachi SR8000
Parallel Computing
Efficient 2D FFT implementation on mediaprocessors
Parallel Computing
Automatic parallelism in differentiation of Fourier transforms
Proceedings of the 2003 ACM symposium on Applied computing
Journal of Computational Physics
Using randomization to make recursive matrix algorithms practical
Journal of Functional Programming
On the effectiveness of functional language features: NAS benchmark FT
Journal of Functional Programming
Numerical valuation of options with jumps in the underlying
Applied Numerical Mathematics
Formal loop merging for signal transforms
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Automatic generation of customized discrete fourier transform IPs
Proceedings of the 42nd annual Design Automation Conference
International Journal of High Performance Computing Applications
Automatic Performance Tuning for Fast Fourier Transforms
International Journal of High Performance Computing Applications
Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms
International Journal of High Performance Computing Applications
Journal of Computational and Applied Mathematics
DFTI---a new interface for Fast Fourier Transform libraries
ACM Transactions on Mathematical Software (TOMS)
A comrade-matrix-based derivation of the eight versions of fast cosine and sine transforms
Contemporary mathematics
FFT program generation for shared memory: SMP and multicore
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Journal of Computational and Applied Mathematics - Special issue: Applied computational inverse problems
Optimal bit-reversal using vector permutations
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Scheduling FFT computation on SMP and multicore systems
Proceedings of the 21st annual international conference on Supercomputing
Generating symmetric DFTs and equivariant FFT algorithms
Proceedings of the 2007 international symposium on Symbolic and algebraic computation
Fast robust regression algorithms for problems with Toeplitz structure
Computational Statistics & Data Analysis
Automatic Generation of FFT for Translations of Multipole Expansions in Spherical Harmonics
International Journal of High Performance Computing Applications
Parallel computation of the eigenvalues of symmetric Toeplitz matrices through iterative methods
Journal of Parallel and Distributed Computing
Formal datapath representation and manipulation for implementing DSP transforms
Proceedings of the 45th annual Design Automation Conference
High performance discrete Fourier transforms on graphics processors
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Bandwidth intensive 3-D FFT kernel for GPUs using CUDA
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Efficient Multiplication Using Type 2 Optimal Normal Bases
WAIFI '07 Proceedings of the 1st international workshop on Arithmetic of Finite Fields
Large-Scale Image Deblurring in Java
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
System Demonstration of Spiral: Generator for High-Performance Linear Transform Libraries
AMAST 2008 Proceedings of the 12th international conference on Algebraic Methodology and Software Technology
How to Write Fast Numerical Code: A Small Introduction
Generative and Transformational Techniques in Software Engineering II
An efficient implementation of a numerical method for a chemotaxis system
International Journal of Computer Mathematics - RECENT ADVANCES IN COMPUTATIONAL AND APPLIED MATHEMATICS IN SCIENCE AND ENGINEERING
KSSOLV—a MATLAB toolbox for solving the Kohn-Sham equations
ACM Transactions on Mathematical Software (TOMS)
Permuting streaming data using RAMs
Journal of the ACM (JACM)
Hybrid Super/Subthreshold Design of a Low Power Scalable-Throughput FFT Architecture
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Opendda: a Novel High-Performance Computational Framework for the Discrete Dipole Approximation
International Journal of High Performance Computing Applications
Computer generation of fast fourier transforms for the cell broadband engine
Proceedings of the 23rd international conference on Supercomputing
Computer Generation of General Size Linear Transform Libraries
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Using NFFT 3---A Software Library for Various Nonequispaced Fast Fourier Transforms
ACM Transactions on Mathematical Software (TOMS)
Operator Language: A Program Generation Framework for Fast Kernels
DSL '09 Proceedings of the IFIP TC 2 Working Conference on Domain-Specific Languages
Learning Representation and Control in Markov Decision Processes: New Frontiers
Foundations and Trends® in Machine Learning
Numerical valuation of options with jumps in the underlying
Applied Numerical Mathematics
Algebraic signal processing theory: Cooley-Tukey type algorithms for real DFTs
IEEE Transactions on Signal Processing
Auto-tuning 3-D FFT library for CUDA GPUs
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Noniterative map reconstruction using sparse matrix representations
IEEE Transactions on Image Processing
Challenges of implementing cyber-physical security solutions in body area networks
BodyNets '09 Proceedings of the Fourth International Conference on Body Area Networks
Vectorization techniques for the Blue Gene/L double FPU
IBM Journal of Research and Development
ACM SIGGRAPH 2009 Courses
NET-COOP '09 Proceedings of the 3rd Euro-NF Conference on Network Control and Optimization
Radix-4 FFT algorithms with ordered input and output data
DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
Journal of Computational and Applied Mathematics
International Journal of Applied Mathematics and Computer Science - Selected Problems of Computer Science and Control
EURASIP Journal on Advances in Signal Processing - Special issue on dynamic spectrum access for wireless networking
CODELAB: a develpers' tool for efficient code generation and optimization
ICCS'03 Proceedings of the 2003 international conference on Computational science
A vector-parallel FFT with a user-specifiable data distribution scheme
ISPA'03 Proceedings of the 2003 international conference on Parallel and distributed processing and applications
A rewriting system for the vectorization of signal transforms
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
An OpenMP implementation of parallel FFT and its performance on IA-64 processors
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
performance/energy optimization of dsp transforms on the XScale processor
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
An implementation of parallel 1-D FFT using SSE3 instructions on dual-core processors
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Explicit formulas for efficient multiplication in F36m
SAC'07 Proceedings of the 14th international conference on Selected areas in cryptography
An adaptive interface for the efficient computation of the discrete sine transform
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Generating SIMD vectorized permutations
CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
Compact integration factor methods for complex domains and adaptive mesh refinement
Journal of Computational Physics
An empirically tuned 2D and 3D FFT library on CUDA GPU
Proceedings of the 24th ACM International Conference on Supercomputing
Computers & Mathematics with Applications
Multi-FFT Vectorization for the Cell Multicore Processor
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Revisiting Cramer's rule for solving dense linear systems
SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
Pricing algorithms for financial derivatives
Algorithms and theory of computation handbook
Applications of FFT and structured matrices
Algorithms and theory of computation handbook
A GPU approach to the simulation of spatio-temporal dynamics in ultrasonic resonators
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Implementation and tuning of a parallel symmetric Toeplitz eigensolver
Journal of Parallel and Distributed Computing
Matrix decomposition algorithms for elliptic boundary value problems: a survey
Numerical Algorithms
Auto-tuning of fast fourier transform on graphics processors
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Automatic SIMD vectorization of fast fourier transforms for the larrabee and AVX instruction sets
Proceedings of the international conference on Supercomputing
FPGA Architecture for 2D Discrete Fourier Transform Based on 2D Decomposition for Large-sized Data
Journal of Signal Processing Systems
A Fourth Order Hermitian Box-Scheme with Fast Solver for the Poisson Problem in a Square
Journal of Scientific Computing
Parallel performance of a 3d elliptic solver
NAA'04 Proceedings of the Third international conference on Numerical Analysis and its Applications
An efficient parallel solution of complex toeplitz linear systems,
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
A hybrid MPI/OpenMP implementation of a parallel 3-d FFT on SMP clusters
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
An efficient computational approach for multiframe blind deconvolution
Journal of Computational and Applied Mathematics
A parallel solution of hermitian toeplitz linear systems,
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part I
Automatically tuned FFTs for bluegene/l's double FPU
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Topology-Based hypercube structures for global communication in heterogeneous networks
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
High performance computing for a financial application using fast fourier transform
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
An implementation of parallel 3-d FFT using short vector SIMD instructions on clusters of PCs
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
The symmetric–toeplitz linear system problem in parallel
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I
High performance 3-D FFT using multiple CUDA GPUs
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Computer Generation of Hardware for Linear Digital Signal Processing Transforms
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hybrid super/subthreshold design of a low power scalable-throughput FFT architecture
Transactions on High-Performance Embedded Architectures and Compilers IV
Automatic performance optimization of the discrete fourier transform on distributed memory computers
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Computer generation of efficient software viterbi decoders
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
SIAM Journal on Matrix Analysis and Applications
FFTs and multiple collective communication on multiprocessor-node architectures
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
A transpose-free in-place SIMD optimized FFT
ACM Transactions on Architecture and Code Optimization (TACO)
Journal of Computational and Applied Mathematics
A fast BIE iteration method for an arbitrary body in a flow of incompressible inviscid fluid
Journal of Computational and Applied Mathematics
A framework for low-communication 1-D FFT
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Scalable multi-GPU 3-D FFT for TSUBAME 2.0 supercomputer
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
High performance 3D convolution for protein docking on IBM blue gene
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
A fourth order finite difference method for the Dirichlet biharmonic problem
Numerical Algorithms
Adaptive computation of self sorting in-place FFTs on hierarchical memory architectures
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
High performance FFT on SGI Altix 3700
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
An implementation of parallel 2-d FFT using intel AVX instructions on multi-core processors
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Proceedings of the International Conference on Computer-Aided Design
Proceedings of the Conference on Design, Automation and Test in Europe
Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
ACM Transactions on Mathematical Software (TOMS)
A level-set method for two-phase flows with moving contact line and insoluble surfactant
Journal of Computational Physics
A framework for low-communication 1-D FFT
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.03 |