Performance models and workload distribution algorithms for optimizing a hybrid CPU-GPU multifrontal solver

Authors:
Chenhan D. Yu;Weichung Wang
Affiliations:
-;-
Venue:
Computers & Mathematics with Applications
Year:
2014

Citing 16
Cited 0

The influence of relaxed supernode partitions on the multifrontal method

ACM Transactions on Mathematical Software (TOMS)
A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
The multifrontal method for sparse matrix solution: theory and practice

SIAM Review
Highly Scalable Parallel Algorithms for Sparse Matrix Factorization

IEEE Transactions on Parallel and Distributed Systems
The Multifrontal Solution of Indefinite Sparse Symmetric Linear

ACM Transactions on Mathematical Software (TOMS)
A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling

SIAM Journal on Matrix Analysis and Applications
MUMPS: A General Purpose Distributed Memory Sparse Solver

PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
A column pre-ordering strategy for the unsymmetric-pattern multifrontal method

ACM Transactions on Mathematical Software (TOMS)
NVIDIA Tesla: A Unified Graphics and Computing Architecture

IEEE Micro
Algorithmic performance studies on graphics processing units

Journal of Parallel and Distributed Computing
On the limits of GPU acceleration

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures

Concurrency and Computation: Practice & Experience - Euro-Par 2009
Multifrontal computations on GPUs and their multi-core hosts

VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Multifrontal Factorization of Sparse SPD Matrices on GPUs

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
A CPU-GPU hybrid approach for the unsymmetric multifrontal method

Parallel Computing
DAGuE: A generic distributed DAG engine for High Performance Computing

Parallel Computing

Quantified Score

Hi-index	0.09

Visualization

Abstract

Problems that involve large and sparse linear systems are ubiquitous in scientific computing, and there are strong needs to accelerate the solution processes. Hybrid CPU-GPU systems have recently become a new platform trend with powerful computing capabilities. However, it is not clear how such systems can accelerate the solvers. We study how to make the best use of the CPU and the GPU to minimize the total time required to solve symmetric positive definite systems using the multifrontal method. We analyze the computation and communication costs of the multifrontal method on such hybrid systems to build up timing performance models. Workload distribution algorithms are proposed to determine if a frontal matrix should be factored on the CPU or on the GPU to minimize the total execution time of the overall computation. We provide theoretical analyses and numerical results to illustrate the characteristics and efficiency of the proposed algorithms. Because the performance models and workload distribution algorithms can accommodate different CPUs and GPUs adaptively, we expect the applicability and significance of these techniques to continue to grow as heterogeneous hardware and software evolve.