Solving ordinary differential equations I (2nd revised. ed.): nonstiff problems
Solving ordinary differential equations I (2nd revised. ed.): nonstiff problems
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Architecture-cognizant divide and conquer algorithms
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Memory characteristics of iterative methods
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Optimizing locality for ODE solvers
ICS '01 Proceedings of the 15th international conference on Supercomputing
Performance optimization of numerically intensive codes
Performance optimization of numerically intensive codes
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines
Scientific Programming
Optimizing locality and scalability of embedded Runge--Kutta solvers using block-based pipelining
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
The efficiency of modern microprocessors is extremely sensitive towards the structure and memory access pattern of programs to be executed. This is caused by memory hierarchies which were introduced to reduce average memory access times. In this paper, we consider embedded Runge-Kutta (RK) methods for the solution of ordinary differential equations arising from space discretization problems for partial differential equations and study their efficient implementation on modern microprocessors. Different program variants with different execution orders and storage schemes are investigated. In particular, we explore how the potential parallelism in the stage vector computation can be exploited in a pipelining approach in order to improve the locality behavior of the RK implementations. Experiments show that this results in efficiency improvements on several recent processors.