The influence of relaxed supernode partitions on the multifrontal method
ACM Transactions on Mathematical Software (TOMS)
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Highly Scalable Parallel Algorithms for Sparse Matrix Factorization
IEEE Transactions on Parallel and Distributed Systems
The Multifrontal Solution of Indefinite Sparse Symmetric Linear
ACM Transactions on Mathematical Software (TOMS)
A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling
SIAM Journal on Matrix Analysis and Applications
MUMPS: A General Purpose Distributed Memory Sparse Solver
PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
A column pre-ordering strategy for the unsymmetric-pattern multifrontal method
ACM Transactions on Mathematical Software (TOMS)
Algorithmic performance studies on graphics processing units
Journal of Parallel and Distributed Computing
On the limits of GPU acceleration
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
Concurrency and Computation: Practice & Experience - Euro-Par 2009
Multifrontal computations on GPUs and their multi-core hosts
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Multifrontal Factorization of Sparse SPD Matrices on GPUs
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
A CPU-GPU hybrid approach for the unsymmetric multifrontal method
Parallel Computing
DAGuE: A generic distributed DAG engine for High Performance Computing
Parallel Computing
Hi-index | 0.09 |
Problems that involve large and sparse linear systems are ubiquitous in scientific computing, and there are strong needs to accelerate the solution processes. Hybrid CPU-GPU systems have recently become a new platform trend with powerful computing capabilities. However, it is not clear how such systems can accelerate the solvers. We study how to make the best use of the CPU and the GPU to minimize the total time required to solve symmetric positive definite systems using the multifrontal method. We analyze the computation and communication costs of the multifrontal method on such hybrid systems to build up timing performance models. Workload distribution algorithms are proposed to determine if a frontal matrix should be factored on the CPU or on the GPU to minimize the total execution time of the overall computation. We provide theoretical analyses and numerical results to illustrate the characteristics and efficiency of the proposed algorithms. Because the performance models and workload distribution algorithms can accommodate different CPUs and GPUs adaptively, we expect the applicability and significance of these techniques to continue to grow as heterogeneous hardware and software evolve.