An implementation of the tile QR factorization for a GPU and multiple CPUs

Authors:
Jakub Kurzak;Rajib Nath;Peng Du;Jack Dongarra
Affiliations:
University of Tennessee, Knoxville, TN;University of Tennessee, Knoxville, TN;University of Tennessee, Knoxville, TN;University of Tennessee, Knoxville, TN
Venue:
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Year:
2010

Citing 10
Cited 0

The design, implementation, and evaluation of Jade

ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel tiled QR factorization for multicore architectures

Concurrency and Computation: Practice & Experience
A class of parallel tiled linear algebra algorithms for multicore architectures

Parallel Computing
QR factorization for the Cell Broadband Engine

Scientific Programming - High Performance Computing with the Cell Broadband Engine
A Note on Auto-tuning GEMM for GPUs

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Hierarchical Task-Based Programming With StarSs

International Journal of High Performance Computing Applications
Scheduling dense linear algebra operations on multicore processors

Concurrency and Computation: Practice & Experience
Towards dense linear algebra for hybrid GPU accelerated manycore systems

Parallel Computing
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures

Concurrency and Computation: Practice & Experience - Euro-Par 2009
Accelerating GPU kernels for dense linear algebra

VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science

Quantified Score

Hi-index	0.00

Visualization

Abstract

The tile QR factorization provides an efficient and scalable way for factoring a dense matrix in parallel on multicore processors. This article presents a way of efficiently implementing the algorithm on a system with a powerful GPU and many multicore CPUs.