Performance models for the Spike banded linear system solver

Authors:
Murat Manguoglu;Faisal Saied;Ahmed Sameh;Ananth Grama
Affiliations:
Department of Computer Engineering, Middle East Technical University, Ankara, Turkey. E-mail: manguoglu@ceng.metu.edu.tr;Department of Computer Science, Purdue University, West Lafayette, IN, USA;Department of Computer Science, Purdue University, West Lafayette, IN, USA;Department of Computer Science, Purdue University, West Lafayette, IN, USA
Venue:
Scientific Programming
Year:
2011

Citing 22
Cited 1

The computation and communication complexity of a parallel banded system solver

ACM Transactions on Mathematical Software (TOMS)
BI-CGSTAB: a fast and smoothly converging variant of BI-CG for the solution of nonsymmetric linear systems

SIAM Journal on Scientific and Statistical Computing
Developing architecture adaptive algorithms using simulation with MISS-PVM for performance prediction

ICS '97 Proceedings of the 11th international conference on Supercomputing
On Stable Parallel Linear System Solvers

Journal of the ACM (JACM)
Practical Parallel Band Triangular System Solvers

ACM Transactions on Mathematical Software (TOMS)
The trace minimization method for the symmetric generalized eigenvalue problem

Journal of Computational and Applied Mathematics - Special issue on numerical analysis 2000 Vol. III: linear algebra
A review of algebraic multigrid

Journal of Computational and Applied Mathematics - Special issue on numerical analysis 2000 Vol. VII: partial differential equations
Introduction to Parallel Computing

Introduction to Parallel Computing
BoomerAMG: a parallel algebraic multigrid solver and preconditioner

Applied Numerical Mathematics - Developments and trends in iterative methods for large systems of equations—in memoriam Rüdiger Weiss
On the Solution of Boundary Value Problems by Using Fast Generalized Approximate Inverse Banded Matrix Techniques

The Journal of Supercomputing
Reducing the bandwidth of sparse symmetric matrices

ACM '69 Proceedings of the 1969 24th national conference
INTRODUCTION TO MULTIGRID METHODS

INTRODUCTION TO MULTIGRID METHODS
Solving unsymmetric sparse systems of linear equations with PARDISO

Future Generation Computer Systems - Special issue: Selected numerical algorithms
A framework for adaptive algorithm selection in STAPL

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
A Semi-Empirical Model for Maximal LINPACK Performance Predictions

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
A parallel hybrid banded system solver: the SPIKE algorithm

Parallel Computing - Parallel matrix algorithms and applications (PMAA'04)
Analyzing memory access intensity in parallel programs on multicore

Proceedings of the 22nd annual international conference on Supercomputing
High-performance implementation of the level-3 BLAS

ACM Transactions on Mathematical Software (TOMS)
PSPIKE: A Parallel Hybrid Sparse Linear System Solver

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Analysis of the Truncated SPIKE Algorithm

SIAM Journal on Matrix Analysis and Applications
On some parallel banded system solvers

Parallel Computing
Weighted Matrix Ordering and Parallel Banded Preconditioners for Iterative Linear System Solvers

SIAM Journal on Scientific Computing

A threaded SPIKE algorithm for solving general banded systems

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

With availability of large-scale parallel platforms comprised of tens-of-thousands of processors and beyond, there is significant impetus for the development of scalable parallel sparse linear system solvers and preconditioners. An integral part of this design process is the development of performance models capable of predicting performance and providing accurate cost models for the solvers and preconditioners. There has been some work in the past on characterizing performance of the iterative solvers themselves. In this paper, we investigate the problem of characterizing performance and scalability of banded preconditioners. Recent work has demonstrated the superior convergence properties and robustness of banded preconditioners, compared to state-of-the-art ILU family of preconditioners as well as algebraic multigrid preconditioners. Furthermore, when used in conjunction with efficient banded solvers, banded preconditioners are capable of significantly faster time-to-solution. Our banded solver, the Truncated Spike algorithm is specifically designed for parallel performance and tolerance to deep memory hierarchies. Its regular structure is also highly amenable to accurate performance characterization. Using these characteristics, we derive the following results in this paper: (i) we develop parallel formulations of the Truncated Spike solver, (ii) we develop a highly accurate pseudo-analytical parallel performance model for our solver, (iii) we show excellent predication capabilities of our model - based on which we argue the high scalability of our solver. Our pseudo-analytical performance model is based on analytical performance characterization of each phase of our solver. These analytical models are then parameterized using actual runtime information on target platforms. An important consequence of our performance models is that they reveal underlying performance bottlenecks in both serial and parallel formulations. All of our results are validated on diverse heterogeneous multiclusters - platforms for which performance prediction is particularly challenging. Finally, we provide predict the scalability of the Spike algorithm using up to 65,536 cores with our model. In this paper we extend the results presented in the Ninth International Symposium on Parallel and Distributed Computing.