ScaLAPACK user's guide
Parallel Algorithms and Architectures
Parallel Algorithms and Architectures
Fast matrix multiplies using graphics hardware
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems
PARA '98 Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems
Using modern graphics architectures for general-purpose computing: a framework and analysis
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Parallel Gaussian Elimination Using OpenMP and MPI
HPCS '02 Proceedings of the 16th Annual International Symposium on High Performance Computing Systems and Applications
Linear algebra operators for GPU implementation of numerical algorithms
ACM SIGGRAPH 2003 Papers
Parallel out-of-core computation and updating of the QR factorization
ACM Transactions on Mathematical Software (TOMS)
Understanding the efficiency of GPU algorithms for matrix-matrix multiplication
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Parallel tiled QR factorization for multicore architectures
Concurrency and Computation: Practice & Experience
Benchmarking GPUs to tune dense linear algebra
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Solving dense linear systems on platforms with multiple hardware accelerators
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Implementing a parallel matrix factorization library on the cell broadband engine
Scientific Programming - High Performance Computing with the Cell Broadband Engine
QR factorization for the Cell Broadband Engine
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Out-of-Core Computation of the QR Factorization on Multi-core Processors
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Applying recursion to serial and parallel QR factorization leads to better performance
IBM Journal of Research and Development
Scheduling dense linear algebra operations on multicore processors
Concurrency and Computation: Practice & Experience
Implementing linear algebra routines on multi-core processors with pipelining and a look ahead
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Performance study of LU decomposition on the programmable GPU
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Hi-index | 7.29 |
Recent developments in high performance computer architecture have a significant effect on all fields of scientific computing. Linear algebra and especially the solution of linear systems of equations lie at the heart of many applications in scientific computing. This paper describes and analyzes three parallel versions of the dense direct methods such as the Gaussian elimination method and the LU form of Gaussian elimination that are used in linear system solving on a multicore using an OpenMP interface. More specifically, we present two naive parallel algorithms based on row block and row cyclic data distribution and we put special emphasis on presenting a third parallel algorithm based on the pipeline technique. Further, we propose an implementation of the pipelining technique in OpenMP. Experimental results on a multicore CPU show that the proposed OpenMP pipeline implementation achieves good overall performance compared to the other two naive parallel methods. Finally, in this work we propose a simple, fast and reasonably analytical model to predict the performance of the direct methods with the pipelining technique.