The design, implementation, and evaluation of Jade
ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel tiled QR factorization for multicore architectures
Concurrency and Computation: Practice & Experience
QR factorization for the Cell Broadband Engine
Scientific Programming - High Performance Computing with the Cell Broadband Engine
A Note on Auto-tuning GEMM for GPUs
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Hierarchical Task-Based Programming With StarSs
International Journal of High Performance Computing Applications
Scheduling dense linear algebra operations on multicore processors
Concurrency and Computation: Practice & Experience
Towards dense linear algebra for hybrid GPU accelerated manycore systems
Parallel Computing
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
Concurrency and Computation: Practice & Experience - Euro-Par 2009
Accelerating GPU kernels for dense linear algebra
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Hi-index | 0.00 |
The tile QR factorization provides an efficient and scalable way for factoring a dense matrix in parallel on multicore processors. This article presents a way of efficiently implementing the algorithm on a system with a powerful GPU and many multicore CPUs.