Performance optimization of RK methods using block-based pipelining

Authors:
Matthias Korch;Thomas Rauber;Gudula Rünger
Affiliations:
Department of Mathematics and Physics University of Bayreuth, Bayreuth, Germany;Department of Mathematics and Physics University of Bayreuth, Bayreuth, Germany;Department of Computer Science, Technical University of Chemnitz, Chemnitz, Germany
Venue:
Performance analysis and grid computing
Year:
2004

Citing 8
Cited 1

Solving ordinary differential equations I (2nd revised. ed.): nonstiff problems

Solving ordinary differential equations I (2nd revised. ed.): nonstiff problems
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
Architecture-cognizant divide and conquer algorithms

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Memory characteristics of iterative methods

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Optimizing locality for ODE solvers

ICS '01 Proceedings of the 15th international conference on Supercomputing
Performance optimization of numerically intensive codes

Performance optimization of numerically intensive codes
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

Scientific Programming

Optimizing locality and scalability of embedded Runge--Kutta solvers using block-based pipelining

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The efficiency of modern microprocessors is extremely sensitive towards the structure and memory access pattern of programs to be executed. This is caused by memory hierarchies which were introduced to reduce average memory access times. In this paper, we consider embedded Runge-Kutta (RK) methods for the solution of ordinary differential equations arising from space discretization problems for partial differential equations and study their efficient implementation on modern microprocessors. Different program variants with different execution orders and storage schemes are investigated. In particular, we explore how the potential parallelism in the stage vector computation can be exploited in a pipelining approach in order to improve the locality behavior of the RK implementations. Experiments show that this results in efficiency improvements on several recent processors.