Scheduling dense linear algebra operations on multicore processors

Authors:
Jakub Kurzak;Hatem Ltaief;Jack Dongarra;Rosa M. Badia
Affiliations:
Department of Electrical Engineering and Computer Science, University of Tennessee, TN, U.S.A.;Department of Electrical Engineering and Computer Science, University of Tennessee, TN, U.S.A.;Department of Electrical Engineering and Computer Science, Univ. of Tennessee, TN, U.S.A. and Comp. Sci. and Mathematics Division, Oak Ridge Natnl. Lab., TN, U.S.A. and Sch. of Math. and Sch. of C ...;Barcelona Supercomputing Center—Centro Nacional de Supercomputación, Barcelona, Spain
Venue:
Concurrency and Computation: Practice & Experience
Year:
2010

Citing 0
Cited 11

Collaborative scheduling of DAG structured computations on multicore processors

Proceedings of the 7th ACM international conference on Computing frontiers
Parallel direct methods for solving the system of linear equations with pipelining on a multicore using OpenMP

Journal of Computational and Applied Mathematics
An implementation of the tile QR factorization for a GPU and multiple CPUs

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
DAG-Based software frameworks for PDEs

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Performance study of matrix computations using multi-core programming tools

Proceedings of the Fifth Balkan Conference in Informatics
High-performance general solver for extremely large-scale semidefinite programming problems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Cache-sensitive MapReduce DGEMM algorithms for shared memory architectures

Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference
Fast parallel algorithms for blocked dense matrix multiplication on shared memory architectures

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Work-efficient matrix inversion in polylogarithmic time

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
An improved parallel singular value algorithm and its implementation for multicore hardware

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

State-of-the-art dense linear algebra software, such as the LAPACK and ScaLAPACK libraries, suffers performance losses on multicore processors due to their inability to fully exploit thread-level parallelism. At the same time, the coarse–grain dataflow model gains popularity as a paradigm for programming multicore architectures. This work looks at implementing classic dense linear algebra workloads, the Cholesky factorization, the QR factorization and the LU factorization, using dynamic data-driven execution. Two emerging approaches to implementing coarse–grain dataflow are examined, the model of nested parallelism, represented by the Cilk framework, and the model of parallelism expressed through an arbitrary Direct Acyclic Graph, represented by the SMP Superscalar framework. Performance and coding effort are analyzed and compared against code manually parallelized at the thread level. Copyright © 2009 John Wiley & Sons, Ltd.