A mixed-precision algorithm for the solution of Lyapunov equations on hybrid CPU-GPU platforms

Authors:
Peter Benner;Pablo Ezzatti;Daniel Kressner;Enrique S. Quintana-Ortı;Alfredo Remón
Affiliations:
Max Planck Institute for Dynamics of Complex Technical Systems, Sandtorstr. 1, D-39106 Magdeburg, Germany;Centro de Cálculo-Instituto de la Computación, Universidad de la República, 11.300-Montevideo, Uruguay;Seminar für Angewandte Mathematik, ETHZ, CH-8092 Zürich, Switzerland;Dpto. de Ingenierıa y Ciencia de Computadores, Universidad Jaime I, 12.071-Castellón, Spain;Dpto. de Ingenierıa y Ciencia de Computadores, Universidad Jaime I, 12.071-Castellón, Spain
Venue:
Parallel Computing
Year:
2011

Citing 15
Cited 4

LAPACK's user's guide

LAPACK's user's guide
Linear robust control

Linear robust control
Solution of the matrix equation AX + XB = C [F4]

Communications of the ACM
Accuracy and Stability of Numerical Algorithms

Accuracy and Stability of Numerical Algorithms
Solution of large scale algebraic matrix Riccati equations by use of hierarchical matrices

Computing
State-space truncation methods for parallel model reduction of large-scale systems

Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
Approximation of Large-Scale Dynamical Systems (Advances in Design and Control) (Advances in Design and Control)

Approximation of Large-Scale Dynamical Systems (Advances in Design and Control) (Advances in Design and Control)
Factorized Solution of Lyapunov Equations Based on Hierarchical Matrix Arithmetic

Computing
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Solving linear-quadratic optimal control problems on parallel computers

Optimization Methods & Software
Exploiting the capabilities of modern GPUs for dense matrix computations

Concurrency and Computation: Practice & Experience
Parallel Solvers for Sylvester-Type Matrix Equations with Applications in Condition Estimation, Part I: Theory and Algorithms

ACM Transactions on Mathematical Software (TOMS)
Algorithm 904: The SCASY Library—Parallel Solvers for Sylvester-Type Matrix Equations with Applications in Condition Estimation, Part II

ACM Transactions on Mathematical Software (TOMS)
Using hybrid CPU-GPU platforms to accelerate the computation of the matrix sign function

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
A Novel Parallel QR Algorithm for Hybrid Distributed Memory HPC Systems

SIAM Journal on Scientific Computing

Efficient model order reduction of large-scale systems on multi-core platforms

ICCSA'11 Proceedings of the 2011 international conference on Computational science and Its applications - Volume Part V
Unleashing CPU-GPU acceleration for control theory applications

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Accelerating the Lyapack library using GPUs

The Journal of Supercomputing
Large-scale Stein and Lyapunov equations, Smith method, and applications

Numerical Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a hybrid Lyapunov solver based on the matrix sign function, where the intensive parts of the computation are accelerated using a graphics processor (GPU) while executing the remaining operations on a general-purpose multi-core processor (CPU). The initial stage of the iteration operates in single-precision arithmetic, returning a low-rank factor of an approximate solution. As the main computation in this stage consists of explicit matrix inversions, we propose a hybrid implementation of Gausz-Jordan elimination using look-ahead to overlap computations on GPU and CPU. To improve the approximate solution, we introduce an iterative refinement procedure that allows to cheaply recover full double-precision accuracy. In contrast to earlier approaches to iterative refinement for Lyapunov equations, this approach retains the low-rank factorization structure of the approximate solution. The combination of the two stages results in a mixed-precision algorithm, that exploits the capabilities of both general-purpose CPUs and many-core GPUs and overlaps critical computations. Numerical experiments using real-world data and a platform equipped with two Intel Xeon QuadCore processors and an Nvidia Tesla C1060 show a significant efficiency gain of the hybrid method compared to a classical CPU implementation.