Parallel iteration of high-order Runge-Kutta methods with stepsize control
Journal of Computational and Applied Mathematics
Parallel diagonally implicit Runge-Kutta-Nystro¨m methods
Applied Numerical Mathematics
Parallelism across time in ODEs
Applied Numerical Mathematics - Special issue: parallel methods for ordinary differential equations
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Parallel and sequential methods for ordinary differential equations
Parallel and sequential methods for ordinary differential equations
Early prediction of MPP performance: the SP2, T3D, and Paragon experiences
Parallel Computing
Optimized extrapolation methods for parallel solution of IVPs on different computer architectures
Applied Mathematics and Computation
Triangularly Implicit Iteration Methods for ODE-IVP Solvers
SIAM Journal on Scientific Computing
ICS '90 Proceedings of the 4th international conference on Supercomputing
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Programming with POSIX threads
Programming with POSIX threads
A Compiler Optimization Algorithm for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Journal of Computational and Applied Mathematics
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Architecture-cognizant divide and conquer algorithms
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Journal of Parallel and Distributed Computing
IEEE Transactions on Parallel and Distributed Systems
The working set model for program behavior
Communications of the ACM
Performance optimization of numerically intensive codes
Performance optimization of numerically intensive codes
MPI-The Complete Reference, Volume 1: The MPI Core
MPI-The Complete Reference, Volume 1: The MPI Core
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Increasing temporal locality with skewing and recursive blocking
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Near-Optimal Loop Tiling by Means of Cache Miss Equations and Genetic Algorithms
ICPPW '02 Proceedings of the 2002 International Conference on Parallel Processing Workshops
Optimal Schwarz Waveform Relaxation for the One Dimensional Wave Equation
SIAM Journal on Numerical Analysis
Parallel Two-Step W-Methods with Peer Variables
SIAM Journal on Numerical Analysis
A fast and accurate framework to analyze and optimize cache memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Performance optimization of RK methods using block-based pipelining
Performance analysis and grid computing
Restructuring computations for temporal data cache locality
International Journal of Parallel Programming
Improving locality for ODE solvers by program transformations
Scientific Programming
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines
Scientific Programming
The Waveform Relaxation Method for Time-Domain Analysis of Large Scale Integrated Circuits
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
BlockLib: a skeleton library for cell broadband engine
Proceedings of the 1st international workshop on Multicore software engineering
ACM SIGARCH Computer Architecture News
Parallel Implementation of Runge---Kutta Integrators with Low Storage Requirements
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
NestStepModelica: mathematical modeling and bulk-synchronous parallel simulation
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
SkePU: a multi-backend skeleton programming library for multi-GPU systems
Proceedings of the fourth international workshop on High-level parallel programming and applications
Parallel Low-Storage Runge-Kutta Solvers for ODE Systems with Limited Access Distance
International Journal of High Performance Computing Applications
Applicability of load balancing strategies to data-parallel embedded runge-kutta integrators
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Optimized composition of performance-aware parallel components
Concurrency and Computation: Practice & Experience
Locality optimized shared-memory implementations of iterated runge-kutta methods
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Automatic parallelization of object oriented models executed with inline solvers
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Benefits of using parallelized non-progressive network coding
Journal of Network and Computer Applications
Hi-index | 0.00 |
The increasing gap between the speeds of processors and main memory has led to hardware architectures with an increasing number of caches to reduce average memory access times. Such deep memory hierarchies make the sequential and parallel efficiency of computer programs strongly dependent on their memory access pattern. In this paper, we consider embedded Runge-Kutta methods for the solution of ordinary differential equations and study their efficient implementation on different parallel platforms. In particular, we focus on ordinary differential equations which are characterized by a special access pattern as it results from the spatial discretization of partial differential equations by the method of lines. We explore how the potential parallelism in the stage vector computation of such equations can be exploited in a pipelining approach leading to a better locality behavior and a higher scalability. Experiments show that this approach results in efficiency improvements on several recent sequential and parallel computers.